Statmaster User Manual

STATMASTER

A Quick-And-Dirty Data Graphing Application

User Manual

Version 2.34, 2007/04/18

http://www.wrotniak.net/works/statmaster/index.html

Contents

What is Statmaster?
How to run Statmaster
Data files
- Time format in table data
  - The date offset
Setup file
User interface
- The Context Menu
- Identifying a data point
Examples
Distribution and support
Update history

1. What is Statmaster?

Statmaster is a program for quick-and-dirty graphing of data presented as columns of a text table. It does not attempt to replace any of the more sophisticated data analysis tools; it just gets the job done with the least amount of hassle, and it can be mastered in twenty minutes.

The program is free for personal and educational use.

2. How to run Statmaster

The program is best run from the command line (like the Windows CMD window or the excellent CLI utility, Take Command from JP Software). To execute it, enter the program name with two obligatory arguments, like, for example,

statmaster mysetup.stm.ini mydata.dat

The first argument specifies the file defining how Statmaster will graph the data; the second — the data file itself.

It is recommended (although not required) that the Statmaster inifile has a double extension: *.stm.ini, so that it is easy to identify.

The optional /i parameter can be used on the command line to disable any graph limits defined in the inifile (xlo, xhi, ylo, yhi, see Section 4.6). The command line will then look like

statmaster mysetup.stm.ini mydata.dat /i

This can be useful during the debugging of the process generating your data, when some outliers may be expected, as Statmaster will then choose the limits so that all data points fit into the graph area.

If you use Statmaster together with a text-only program generating some tabular results (a situation common in science and engineering), you may find it convenient to automate the process by writing a small (two-line) batch file, which first executes your program, and then runs Statmaster to present the results. This way you just run the batch, and your results pop up as if your program had direct graphic output; quite handy.

Actually I originally wrote Statmaster with such use in mind — as a back end to some text-only, number-crunching applications.

3. Data files

Statmaster accepts data in text format (plain ASCII, not from a word processor!), arranged in a tabular fashion. Spaces (single or multiple), tabs, and/or commas are recognized as separators.

Note: multiple, consecutive tabs and commas are treated as one. This means you cannot include an "empty" item into a row.

Every text line consists of a number of individual items. An item is usually a numeric value, although text ones are also allowed, as long as Statmaster does not attempt to graph them.

The text items cannot contain commas, tabs, or spaces; see Line 4 in the example below.

A missing value should be shown as NAN (recommended for many reasons), or any text starting from N, n, or ?. Obviously, these will not be drawn in the graphs.

A line starting from a colon is treated as a comment, i.e., ignored.

This is an example of a data file

: Statmaster example data file
123 55.7 -12.6 Beta 3.5
12 89.2 NAN Alpha 12.1
87 12.7 -3.12 Beta YYY 0 127

The last line in the example above has two peculiarities. First of all, it contains two text fields: Beta and YYY. This means that "0" in that line is recognized as the sixth, not fifth element, and we cannot ask Statmaster to process the fifth data column. In other words, the file will work just fine as long as we access not more than three first elements of each line.

Secondly, the last line contains one more item than the others. The file is still OK, as long as we limit ourselves to using only as many elements of every line as the shortest line contains.

3.1. Time format in table data

Skip this section (and any subsections) if you are not planning to use Statmaster for data containing time values formatted as hh:mm:ss.

Any of the numeric fields can, optionally, use the time format: hh:mm:ss or hh:mm:ss. This will be converted to hours with a fraction. For example, 18:30:00 or just 18:30 will be read as 18.5.

To accommodate time spans longer than 24 hours, the time value can be preceded with one of the following characters:

- means "yesterday"; 24 will be subtracted from the value;
+ means "tomorrow", 24 will be added;
= means "day before yesterday", 48 subtracted;
# means "day after tomorrow", 48 added.

(This is easy to remember, as '=' is a double '-', and '#' — a double '+').

For example, -18:30 will be translated as -7.5 (as it is 7.5 hours before the current midnight), and #12:00 as 60 (12 hours and two days).

3.1.1. The date offset

If the DateCol item in the [Statmaster] section of the inifile is specified (see Section 4.1), then the column indicated must contain a date, shown as yyyy/mm/dd or yy/mm/dd. In such case all time values in the given row are offset from that date, and converted to floating-point numbers in hours, measured from the starting midnight of the date specified for the first data point.

Sounds complicated? You can avoid it writing time values as floating-point numbers in hours from any offset you wish, and then you will not have to go through this. If you, however, insist to use time as hh:mm:ss, and to have data spanning multiple days, the date has to be specified somehow; nothing comes free.

In the following example we will assume that the first column specifies the date:

2006/12/01 13:30:00
2006/12/02 -13:30:00
2006/11/29 #13:30:00

If so, then Statmaster will see all three time fields in Column 2 as being the same, and assign them the same value of 13.5. If we put the last line on the top, the values will still be equal, but to a different value, 61.5 (which is 2×24+13.5).

4. Setup file

This is a file which instructs Statmaster how to present the data. This text file follows the standard Windows inifile format, containing items grouped in sections (these are denoted with square brackets).

Note that the ordering of lines within a section is irrelevant.

4.1. The [Statmaster] section

Any items listed in this section are entirely optional, and so is the section itself. The parameters included here define some aspects of how the program behaves. Here is the full list:

Editor defines the text editor invoked by Statmaster when one of the Edit File options from the pop-um neu is activated. If not listed, the standard Windows Notepad will be used. The executable has to reside in one of the directories Windows checks by default, or it has be specified including the full path; see the example below.
Tabs specifies how the page tabs are located on the screen: Bottom (default), Top, Left, or Right.
TabsPerRow — an integer value specifying how many tabs will be created in one row before another one is created. This is important only for setups with large number of pages. The default is six.
PixOff (an integer with a default of 2) — pixel tolerance for data point identification by clicking.
DateCol — the data column index (integer, default is 0), specifying date for time values formatted as hh:mm:ss (see Section 3.1.1)

An example:

[Statmaster]
Tabs = Right
TabsPerRow = 4
PixOff = 3
DateCol =
Editor = c:\Program Files\Textpad\textpad.exe

4.2. The [DataFile] section

The first section, [DataFile], defines the data items from the data file which will be graphed, and, optionally, text fields by which the items can be selected for plotting. It may look, for example, like this

[DataFile]
VOLT = 1 1.5 0.001 Applied Voltage
TEMP = 3 2.2 1 Temperature at base
DELT = 3-2 3.2 1 Delta Temperature
@TYPE = 4

Every line in this section defines one data item which Statmaster will be able to access, or a text selector, with a name starting from the ampersand, '@'. Graphs can be drawn with a requirement that the text selector on for a particular data point has a given value; see below.

For item definitions, he internal name of the item (e.g. VOLT) is followed by the equal sign, and then four pieces of information:

Column) in the data table where this item is found. This is either a single number from 1 up, or two such numbers with an operator, denoting that the data item will be a single-operator combination of both. The operators can be: +, -, *, and /.
For example, TEMP is defined as Column 3 from the table, while DELT — as Column 3 minus Column 2.
Digits before and after decimal point, used in formatting the statistical parameter values and/or axis descriptions for this item. If the number of digits before the decimal point is not sufficient to display a value, it will be modified by the program.
In our example, VOLT will be shown as 1.12345, but higher values may add digits before the decimal point, like in 123.12345; just common sense.
Scale factor: the value will be multiplied by this before being used in a graph.
For example, if your VOLT data item is given in millivolts in the data file (column 1), it will be graphed in volts (multiplied by 0.001).
Description of the item, to be used in graph captions.
Note that anything following the scale factor will be treated as description, therefore it may contain spaces or commas. For example, the x-axis of the TEMP histogram will be described as Temperature at base.

The data item ID (before '=') is just an identifier for your own use in the inifile. It does not have to be in uppercase. I'm using uppercase here, because it makes it easier to distinguish. Spaces, commas, and tabs are not allowed in the ID, and the leading '@' is reserved to denote a text selector, see below.

In our example, the third line defines a data item internally named DELT, computed as a difference between Columns 3 and 2 from the original table, described as Delta Temperature, with 3 digits before the decimal point and 2 digits after it used for displaying the values (this does not define how the data is shown in the file, where any floating point format will be OK). The item will be graphed without rescaling (scale factor of 1).

4.2.1. Selector items

A text selector definition consists only from the selector name (with the obligatory '@' as the first character), then the equal sign and column number. Up to 32 text selectors can be defined. Selectors are used for two purposes:

To select which data points (lines from the input table) to graph (see Section 4.4);
To provide the text shown when a data point is clicked upon (scatter plots only, see Section 5.2).

The last line in our example defines a text selector named "@TYPE", based on the fourth column of the data table.

4.3. The [FILTERS] section

This section is optional. It may be used to filter outliers by rejecting data values outside of a given interval.

Each line in this section consists of an item name as defined in [DataFile], followed by an equals sign and two values (minimum and maximum), specifying the acceptance range for that item. Values outside this range will be, replaced with NAN (not-a-number), so that they will not be plotted or used in statistical parameter calculations.

Here is an example of that section:

[Filters]
VOLT = 0.0 2.5
DELT = -1.0 1.0

The NAN-replacement metaphor is strict; for example, if the filtered data item is used to select points for drawing with one of the Key1...Key3 parameters, the data point will be rejected if the key value is out of range.

Note that filtering is applied to the data item as defined in the [DataFile] section, not to the column of the original table in the input file. This means that if a data item is defined as a combination of two columns, its filter will be independent of any filters defined for other,if any, data items based on these columns.

To apply the filtering to some graphs but not to others, define two data items based on the same column; one with a filter and one without; then use one of the two items as needed.

4.4. The [Pages] section

This section has only one obligatory entry, defining how many graph pages will be shown by the program (each accessible as a tab on a tabbed notebook), and assigning these pages IDs for use in other sections. For example:

[Pages]
Pages = BASIC EXTRA

The Pages line informs Statmaster that two graph pages will be drawn, identified as BASIC and EXTRAS. Once again, you may use any names here, not necessarily in uppercase, but they cannot contain any separators.

Up to sixteen pages can be defined in a single run of the program.

In addition to that line, the [Pages] section may define defaults for page sections which do not specify some of their own parameters, or even for graphs within those pages. This will be discussed further on.

4.5. Individual page sections

Each page defined with an identifier on the Pages line in the [Pages] section has to have its own section, with the section name being the page identifier. In the above example, two sections, named [BASICS] and [EXTRAS] will be expected.

Each of that sections defines the graphs shown in the corresponding page, as well as the page attributes. Here is an annotated example for one of the pages:

[BASIC]
Rows = 1
Columns = 2
Graphs = VOLTHIST VOLTTEMP
Title = Basic Characteristics, Moderate Load
Name = Basic
FontColor = Lime
Font = Verdana
TitleFontSize = 20
FontSize = 12

The first three items define the graphs themselves and their layout:

Rows — how many rows of graphs the page will have, 1 to 8.
Columns — how many columns, 1 to 8.
Graphs — lists the IDs of these graphs; up to 64 are allowed. Again, these IDs are up to the user. The number of graphs listed here cannot exceed Rows multiplied by Columns.

For simplicity, all graphs within one page share the same width and height.

In our example, we will have just two graphs side by side (one row by two columns). On my 1600x1200 monitor even an 8x8 grid looks OK.

The remaining parameters define general page attributes:

Title — this text will be shown at the top of the page. If not specified, the title will not be shown at all, saving you some space.
TitleFontSize — font size (pixels, not points!) used for page title, if shown. The font face is as defined by Font below.
Name — this will be shown on the page tab. It should be short enough to fit into the tab width, otherwise it will be truncated.
FontColor — defines the font color used for text (captions, axis values) in that page; see the remark on colors below.
Font — defines the font face name; Verdana is a good choice. If the requested font is not currently installed, Windows will do a substitution, for better or worse.
FontSize — font size (in pixels) used for all text, except title, within the page (e.g., graph axis descriptions, parameter); note that this does not include UI elements (tabs, menus), as these are drawn by Windows.

In addition to these parameters, the section for a given page may also contain ones expected in a definition of a particular graph (see below); this will then become a default for all graphs on that page. Indeed, these can be also specified in the [Pages] section, therefore becoming global defaults!

4.5.1. Note on colors

Any lines specifying colors in the Statmaster inifile can do it in two ways:

By hexadecimal RGB value, rrbbgg, with rr being a two-digit number for Red, bb for Blue, and gg for Green. Each digit can be 0..9,A..F, with 00 resulting in the value of 0, and FF for 255 (maximum).
For example, FF0000 is bright red, while A00000 — darker red. Yellow is a mix of red and green, therefore FFFF00 is yellow.
By one of the English names recognized by Windows. This may change from one Windows version to another; in Windows XP the recognized color names are: Aqua, Black, Blue, Cream, DkGray, Fuchsia, Green, Lime, LtGray, Maroon, MedGray, MoneyGreen, Navy, Olive, Purple, Red, Silver, SkyBlue, Teal, White, and Yellow.
I am not sure how this works in non-English versions of Windows. Try it at your own risk, if you want; nothing really wrong will happen. If in doubt, use RGB values.

4.6. Individual graph sections

These are sections where the properties and data sources for individual graphs are defined.

Each of the identifiers listed on the Graphs line of a particular page section needs its own section, named with the page identifier and graph identifier, separated with a slash. For example, the graphs identified above need sections [BASIC/VOLTHIST] and [BASIC/VOLTTEMP].

Again, the contents of an individual graph section can be best explained on examples. We will be showing two of those, side by side.

[BASIC/VOLT]
Kind = HIST
x = VOLT
xlo = -20
xhi = 80
dx = 2
RoundX = NO
Required = TEMP
Params = YES
Fill = 1
Grid = NO
Color = Aqua
BgColor = Black
GridColor = Gray
FrameColor = Silver
Key1 = @TYPE Beta

[BASIC/VOLTTEMP]
Kind = SCAT
x = VOLT
y = TEMP
xlo = 0
xhi = 12.5
ylo = -20.0
yhi = 80
Marker = BOX1
RoundX = YES
RoundY = YES
LogX = NO
LogY = YES
Required =
Banned =
Grid = YES
Color = FF44FF
BgColor = Navy
GridColor = Black
FrameColor = Black
Hint = @TYPE

There are quite a few parameters here. Not all of them have, however, to be specified in graph sections; those which may be the same in a number of graphs can be defined at the page section level, or even in [Pages] as common to all graphs (or graph layers) within one or all pages, unless overridden at the graph (or graph layer) level.

Note: Starting from Version 2.0 there are no limitations regarding which of the parameters can be defined at higher levels; any parameters can be defined there.

While some of these are self-explanatory, some may be not. Here is the complete list. Parameters specific only to one kind will be listed in separate subsections.

Kind — HIST or SCAT (actually, only the first character is checked), for histogram or scatter-plot, respectively.
Color to be used to draw the data. See the color discussion above.
BgColor — background color. Defining it individually for every graph usually looks too garish; therefore this is the most obvious candidate to be moved to the page (or all pages) level. Still, you decide.
GridColor — color in which the grid within the graph area will be drawn (if at all).
FrameColor — color for the graph frame (limiting the data point area).
Grid — YES or NO, whether to draw the grid.
xlo, xhi — the X-range of the graph. If not given (and not inherited from one of the higher-level sections), the range will be computed and, optionally, rounded.
RoundX — if the X-range of the graph is not explicitly given (therefore being computed by the program), setting this to YES will result in reasonable (whatever that means) rounding of these limits.
x — the ID of the data item drawn along the X-axis.
Required — ID or multiple IDs (separated with spaces) of data items which all must be present (i.e., not NAN) in a given row in order for the point to be drawn.
In this example, the VOLT value will be included into the histogram only if the corresponding TEMP value is defined (not a NAN). This is another way to exclude some points from being graphed.
If this line is defined at a higher level and you want no conditions imposed on a particular graph, use an empty Required line to override the inherited one. The line will be inherited only if it is not defined at all in the graph section. Obviously, the data item specified as x or y don't have to be included here: if a point does nor exist, it cannot be drawn.
Banned — again, ID or multiple IDs data items none of which must be present (i.e., must be NAN) in a given row; if any of them is not NAN, the point will not be drawn.
As in Required, an empty parameter can be used to disable Banned defined at a higher level.
If the data item specified in Banned is the same as in x or (for scatter plots) y, then, obviously, no points will be drawn at all.
Key1, Key2, and Key3 — each optionally defines a text selector and its value; only those data points fro which the used selector has a given value will be plotted.
The VOLT item, plotted in [BASIC/VOLT] histogram, will be included only for data points for which the text selector named @TYPE is "Beta".

4.6.1. Histograms

Some of the parameters are specific to histogram graphs only. If given in a section describing a scatter plot, they will be ignored.

Params — if YES, basic population parameters will be drawn below the histogram: population size, mean, standard deviation, minimum, and maximum.
Bins defines into how many bins will the X-range be divided; optional. Default is 0 (which means undefined).
dx — used only if Bins is not defined or 0; defines the bin width.
LeftIn — if YES, the left end of the bin counts as inside; right outside. If NO (default), vice versa. Useful for discrete data; irrelevant for continuous.
This is useful when you are histogramming integer values. For example, with your histogram having 10 bins from 0 to 10, the value of 1 can be counted as belonging to the first (NO) or the second (YES) bin.
Fill defines the fill, if any, used for histogram bins:
- 0 — no fill (hollow);
- 1 — solid fill of the same color in which the histogram is drawn;
- 2..4 — three fills most suitable on dark backgrounds;
- 5..7 — fills looking good on light ones.

If neither Bins or dx is defined, Statmaster will make its own (usually quite reasonable) decision on the subject.

It is usually recommended to run the program first (on a given data set, that is) without xlo, xhi, Bins, and/or dx and dy defined at all, just to have a look at the data, and only then to decide if these parameters need to be explicitly defined.

4.6.2. Scatter plots

Defining a graph as a scatter plot requires or allows some other parameters.

y — defined the data item corresponding to the Y axis.
ylo, yhi — the Y range of the graph.
Marker — the marker used for data points:
- POINT — a single pixel;
- BOX1 — a 3×3 pixel square
- BOX2 — a 5×5 pixel square
- EX1 — a 3×3 pixel '×'
- EX2 — a 5×5 pixel '×'
- PLUS1 — a 3×3 pixel '+'
- PLUS2 — a 5×5 pixel '+'
- LINE is somewhat special: the sequence of XY points will be drawn as a line connecting them in the order as specified in the data file, so that effectively we get a line graph, not a scatter plot.
RoundY — see RoundX above, but for the Y range.
LogX, LogY — YES or NO, define if the plot uses the logarithmic scale along the given axis. Default is NO in either case.
Hint — this line may contain up to three selectors (see Section 4.2.1) separated with ; their text strings will be combined and shown when a data point (scatter plots only) is clicked upon.

4.6.3. Multi-layer scatter plots

A number (up to 32) of scatter plots can be overlaid one on top of another (this is most useful for, but not limited to, line graphs, which Statmaster treats as scatter plots, just connecting the points).

This is done by specifying a number of layers in the relevant graph section, and then defining each layer in its own, separate subsection, as shown in the example below.

[EXTRA/DOUBLE]
Kind = SCAT
Layers = TEMP CORRTEMP

[EXTRA/DOUBLE/TEMP]
x = VOLT
y = TEMP
Color = Yellow
Marker = BOX1

[EXTRA/DOUBLE/CORRTEMP]
x = VOLT
y = CORRTEMP
Color = Lime
Marker = CROSS1

Layer section names are created by appending a slash and layer identifier to the graph section name.

Note that the x-variables in this example here are the same both data sets. Therefore we could replace both X lines with just one, placed in the [EXTRA/DOUBLE] section. In a manner similar to the one discussed previously, if a parameter is not found in a layer section, the program will try to find it in the graph section, and then, if needed, in the page and [Pages] ones. In addition to the Color, Marker, x, and y parameters, a layer section can also contain Required and Key1..Key3 lines described in 4.5.

5. User interface

The user interface of Statmaster is really minimal.

The program window has the usual Windows controls, allowing to resize it, minimize, restore or close (terminate the program). If you resize the window, use Redraw (see below) to rebuild all graphs.

5.1. The Context Menu

A right click anywhere in the Statmaster window will bring a context menu, with the following items:

Redraw All and Redraw All (Ignore Limits) — all graphs will be redrawn from scratch, including reading the setup and data files. This is useful when we are tweaking the setup in the text editor: just save and redraw, without having to exit and re-run the program.
The difference between these two options is that the first one always uses any graph limits (xlo, xhi, ylo, yhi) as defined in the inifile, and the second — never, always computing them as needed from the data points, regardless of the /i command line switch on the command line.
Screen Shot — screenshots of...
- This Page —the current page
- All Pages — all graph pages
will be written as *.bmp image files to the scrn subdirectory inside the directory in which statmaster.exe resides. File names are created of the inifile name, a two-digit page number, and the bmp extension, separated with dots; for example, mysetup.02.bmp
Edit File is a submenu opening the selected file in the text editor. It offers two options:
- Setup — the current setup (.stm.ini) file;
- Data — the current data file.
If, after being modified, either file is saved, the effect can be seen by using the Redraw All option described above.
The text editor used to open files is specified in the [Statmaster] section of the setup file. It defaults to the Notepad included with Windows; be aware that this program may have problems opening large data files. Any serious work with text files requires a more serious application like TextPad or UltraEdit.
Help — will show a local copy of this document in your default Web browser.
Statmaster Web page — a link to wrotniak.net.
About Statmaster — credits and version information.
Exit — terminate the program.

5.2. Identifying a data point

If a given scatter plot (or one of its layers) has a Hint parameter defined in the appropriate section of the inifile, then left-clicking on a data point within that plot (or layer) will display hint fields (selectors) of that point in the Statmaster window's Title Bar. The text will be shown only as long as the mouse button is depressed.

The pixel tolerance used in matching mouse coordinates to the data point is defined via the PixOff parameter in the [Statmaster] section of the inifile.

If more than one point match the mouse position, the first one found will be used for the display.

6. Examples

The distribution archive contains an example data file, example.dat, placed in a subfolder named data, as well as three corresponding setup files, example-1.stm.ini, example-2.stm.ini, and example-3.stm.ini, demonstrating most of the Statmaster features.

Run Statmaster by entering, for example

statmaster example-1.stm.ini data\example.dat

and then open the setup file in a text editor to see how the graphs were generated. The setup files illustrate most of the information given in this manual. Feel free to modify a detup file; upon saving it you may se the right-click menu to refresh Statmaster's display without leaving and re-entering the program.

The setup files, all using the same data file, start from the simplest fatures and then add less trivial ones:

example-1.stm.ini shows how to get started fast; it draws three distributions and one scatter plot of the data in selected columns of statmaster.dat, leaving all settings st their default values. This is Statmaster at its simplest.
example-2.stm.ini changes some graph and page attributes; it also adds a second page of graphs. By using selectors in hint definitions, it will show information on a data point clicked upon.
example-3.stm.ini illustrates how to use selectors to draw graphs for selected subsets of data, and how to use a number of layers for these subsets, showing them in different colors within one graph.

These files can be also used as a starting point to generating your own Statmaster setups.

7. Distribution and support

Statmaster is free for personal and educational use (this includes research at schools and universities). Business and government users should contact me (via the link at wrotniak.net) about licensing conditions.

Shareware and freeware vendors are free to include Statmaster in their collections as long as the whole original archive, including this document, is in the package.

Updates (new features, bug fixes) are available at

http://www.wrotniak.net/works/statmaster/index.html,

where you can also find my email address.

Although the program is supplied on the "as is" basis, I will be glad to receive feedback: problem reports and enhancement suggestions. On the other hand, I cannot offer you any help in data reduction or statistics.

The Fine Print: Although the author has taken more than reasonable care to assure that the program works exactly as documented above and does not cause any damage, he assumes no responsibility, express or implied, for any results of use, misuse, or inability to use this software.

Document last updated 2007/04/18