STATMASTER A Quick-And-Dirty Data Graphing Application User Manual Version 2.34, 2007/04/18 |
Contents
Statmaster is a program for quick-and-dirty graphing of data presented as columns of a text table. It does not attempt to replace any of the more sophisticated data analysis tools; it just gets the job done with the least amount of hassle, and it can be mastered in twenty minutes. The program is free for personal and educational use. The program is best run from the command line (like the Windows CMD window or the excellent CLI utility, Take Command from JP Software). To execute it, enter the program name with two obligatory arguments, like, for example, statmaster mysetup.stm.ini mydata.dat The first argument specifies the file defining how Statmaster will graph the data; the second — the data file itself. It is recommended (although not required) that the Statmaster inifile has a double extension: *.stm.ini, so that it is easy to identify. The optional /i parameter can be used on the command line to disable any graph limits defined in the inifile (xlo, xhi, ylo, yhi, see Section 4.6). The command line will then look like statmaster mysetup.stm.ini mydata.dat /i This can be useful during the debugging of the process generating your data, when some outliers may be expected, as Statmaster will then choose the limits so that all data points fit into the graph area. If you use Statmaster together with a text-only program generating some tabular results (a situation common in science and engineering), you may find it convenient to automate the process by writing a small (two-line) batch file, which first executes your program, and then runs Statmaster to present the results. This way you just run the batch, and your results pop up as if your program had direct graphic output; quite handy. Actually I originally wrote Statmaster with such use in mind — as a back end to some text-only, number-crunching applications. Statmaster accepts data in text format (plain ASCII, not from a word processor!), arranged in a tabular fashion. Spaces (single or multiple), tabs, and/or commas are recognized as separators. Note: multiple, consecutive tabs and commas are treated as one. This means you cannot include an "empty" item into a row. Every text line consists of a number of individual items. An item is usually a numeric value, although text ones are also allowed, as long as Statmaster does not attempt to graph them. The text items cannot contain commas, tabs, or spaces; see Line 4 in the example below. A missing value should be shown as NAN (recommended for many reasons), or any text starting from N, n, or ?. Obviously, these will not be drawn in the graphs. A line starting from a colon is treated as a comment, i.e., ignored. This is an example of a data file
: Statmaster example data file The last line in the example above has two peculiarities. First of all, it contains two text fields: Beta and YYY. This means that "0" in that line is recognized as the sixth, not fifth element, and we cannot ask Statmaster to process the fifth data column. In other words, the file will work just fine as long as we access not more than three first elements of each line. Secondly, the last line contains one more item than the others. The file is still OK, as long as we limit ourselves to using only as many elements of every line as the shortest line contains. 3.1. Time format in table data Skip this section (and any subsections) if you are not planning to use Statmaster for data containing time values formatted as hh:mm:ss. Any of the numeric fields can, optionally, use the time format: hh:mm:ss or hh:mm:ss. This will be converted to hours with a fraction. For example, 18:30:00 or just 18:30 will be read as 18.5. To accommodate time spans longer than 24 hours, the time value can be preceded with one of the following characters:
(This is easy to remember, as '=' is a double '-', and '#' — a double '+'). For example, -18:30 will be translated as -7.5 (as it is 7.5 hours before the current midnight), and #12:00 as 60 (12 hours and two days). If the DateCol item in the [Statmaster] section of the inifile is specified (see Section 4.1), then the column indicated must contain a date, shown as yyyy/mm/dd or yy/mm/dd. In such case all time values in the given row are offset from that date, and converted to floating-point numbers in hours, measured from the starting midnight of the date specified for the first data point. Sounds complicated? You can avoid it writing time values as floating-point numbers in hours from any offset you wish, and then you will not have to go through this. If you, however, insist to use time as hh:mm:ss, and to have data spanning multiple days, the date has to be specified somehow; nothing comes free. In the following example we will assume that the first column specifies the date:
2006/12/01 13:30:00 If so, then Statmaster will see all three time fields in Column 2 as being the same, and assign them the same value of 13.5. If we put the last line on the top, the values will still be equal, but to a different value, 61.5 (which is 2×24+13.5). This is a file which instructs Statmaster how to present the data. This text file follows the standard Windows inifile format, containing items grouped in sections (these are denoted with square brackets). Note that the ordering of lines within a section is irrelevant. Any items listed in this section are entirely optional, and so is the section itself. The parameters included here define some aspects of how the program behaves. Here is the full list:
An example:
[Statmaster] The first section, [DataFile], defines the data items from the data file which will be graphed, and, optionally, text fields by which the items can be selected for plotting. It may look, for example, like this
[DataFile] Every line in this section defines one data item which Statmaster will be able to access, or a text selector, with a name starting from the ampersand, '@'. Graphs can be drawn with a requirement that the text selector on for a particular data point has a given value; see below. For item definitions, he internal name of the item (e.g. VOLT) is followed by the equal sign, and then four pieces of information:
The data item ID (before '=') is just an identifier for your own use in the inifile. It does not have to be in uppercase. I'm using uppercase here, because it makes it easier to distinguish. Spaces, commas, and tabs are not allowed in the ID, and the leading '@' is reserved to denote a text selector, see below. In our example, the third line defines a data item internally named DELT, computed as a difference between Columns 3 and 2 from the original table, described as Delta Temperature, with 3 digits before the decimal point and 2 digits after it used for displaying the values (this does not define how the data is shown in the file, where any floating point format will be OK). The item will be graphed without rescaling (scale factor of 1). A text selector definition consists only from the selector name (with the obligatory '@' as the first character), then the equal sign and column number. Up to 32 text selectors can be defined. Selectors are used for two purposes:
The last line in our example defines a text selector named "@TYPE", based on the fourth column of the data table. This section is optional. It may be used to filter outliers by rejecting data values outside of a given interval. Each line in this section consists of an item name as defined in [DataFile], followed by an equals sign and two values (minimum and maximum), specifying the acceptance range for that item. Values outside this range will be, replaced with NAN (not-a-number), so that they will not be plotted or used in statistical parameter calculations. Here is an example of that section:
[Filters] The NAN-replacement metaphor is strict; for example, if the filtered data item is used to select points for drawing with one of the Key1...Key3 parameters, the data point will be rejected if the key value is out of range. Note that filtering is applied to the data item as defined in the [DataFile] section, not to the column of the original table in the input file. This means that if a data item is defined as a combination of two columns, its filter will be independent of any filters defined for other,if any, data items based on these columns. To apply the filtering to some graphs but not to others, define two data items based on the same column; one with a filter and one without; then use one of the two items as needed. This section has only one obligatory entry, defining how many graph pages will be shown by the program (each accessible as a tab on a tabbed notebook), and assigning these pages IDs for use in other sections. For example:
[Pages] The Pages line informs Statmaster that two graph pages will be drawn, identified as BASIC and EXTRAS. Once again, you may use any names here, not necessarily in uppercase, but they cannot contain any separators. Up to sixteen pages can be defined in a single run of the program. In addition to that line, the [Pages] section may define defaults for page sections which do not specify some of their own parameters, or even for graphs within those pages. This will be discussed further on. Each page defined with an identifier on the Pages line in the [Pages] section has to have its own section, with the section name being the page identifier. In the above example, two sections, named [BASICS] and [EXTRAS] will be expected. Each of that sections defines the graphs shown in the corresponding page, as well as the page attributes. Here is an annotated example for one of the pages:
[BASIC] The first three items define the graphs themselves and their layout:
For simplicity, all graphs within one page share the same width and height. In our example, we will have just two graphs side by side (one row by two columns). On my 1600x1200 monitor even an 8x8 grid looks OK. The remaining parameters define general page attributes:
In addition to these parameters, the section for a given page may also contain ones expected in a definition of a particular graph (see below); this will then become a default for all graphs on that page. Indeed, these can be also specified in the [Pages] section, therefore becoming global defaults! Any lines specifying colors in the Statmaster inifile can do it in two ways:
4.6. Individual graph sections These are sections where the properties and data sources for individual graphs are defined.
Each of the identifiers listed on the Graphs line of a particular page section needs its own section, named with the page identifier and graph identifier, separated with a slash. For example, the graphs identified above need sections Again, the contents of an individual graph section can be best explained on examples. We will be showing two of those, side by side. | |
[BASIC/VOLT] |
[BASIC/VOLTTEMP] |
There are quite a few parameters here. Not all of them have, however, to be specified in graph sections; those which may be the same in a number of graphs can be defined at the page section level, or even in [Pages] as common to all graphs (or graph layers) within one or all pages, unless overridden at the graph (or graph layer) level. Note: Starting from Version 2.0 there are no limitations regarding which of the parameters can be defined at higher levels; any parameters can be defined there. While some of these are self-explanatory, some may be not. Here is the complete list. Parameters specific only to one kind will be listed in separate subsections.
Some of the parameters are specific to histogram graphs only. If given in a section describing a scatter plot, they will be ignored.
If neither Bins or dx is defined, Statmaster will make its own (usually quite reasonable) decision on the subject. It is usually recommended to run the program first (on a given data set, that is) without xlo, xhi, Bins, and/or dx and dy defined at all, just to have a look at the data, and only then to decide if these parameters need to be explicitly defined. Defining a graph as a scatter plot requires or allows some other parameters.
4.6.3. Multi-layer scatter plots A number (up to 32) of scatter plots can be overlaid one on top of another (this is most useful for, but not limited to, line graphs, which Statmaster treats as scatter plots, just connecting the points). This is done by specifying a number of layers in the relevant graph section, and then defining each layer in its own, separate subsection, as shown in the example below.
[EXTRA/DOUBLE] Layer section names are created by appending a slash and layer identifier to the graph section name. Note that the x-variables in this example here are the same both data sets. Therefore we could replace both X lines with just one, placed in the [EXTRA/DOUBLE] section. In a manner similar to the one discussed previously, if a parameter is not found in a layer section, the program will try to find it in the graph section, and then, if needed, in the page and [Pages] ones. In addition to the Color, Marker, x, and y parameters, a layer section can also contain Required and Key1..Key3 lines described in 4.5. The user interface of Statmaster is really minimal. The program window has the usual Windows controls, allowing to resize it, minimize, restore or close (terminate the program). If you resize the window, use Redraw (see below) to rebuild all graphs. A right click anywhere in the Statmaster window will bring a context menu, with the following items:
If a given scatter plot (or one of its layers) has a Hint parameter defined in the appropriate section of the inifile, then left-clicking on a data point within that plot (or layer) will display hint fields (selectors) of that point in the Statmaster window's Title Bar. The text will be shown only as long as the mouse button is depressed. The pixel tolerance used in matching mouse coordinates to the data point is defined via the PixOff parameter in the [Statmaster] section of the inifile. If more than one point match the mouse position, the first one found will be used for the display.
The distribution archive contains an example data file, example.dat, placed in a subfolder named data, as well as three corresponding setup files, Run Statmaster by entering, for example statmaster example-1.stm.ini data\example.dat and then open the setup file in a text editor to see how the graphs were generated. The setup files illustrate most of the information given in this manual. Feel free to modify a detup file; upon saving it you may se the right-click menu to refresh Statmaster's display without leaving and re-entering the program. The setup files, all using the same data file, start from the simplest fatures and then add less trivial ones:
These files can be also used as a starting point to generating your own Statmaster setups. Statmaster is free for personal and educational use (this includes research at schools and universities). Business and government users should contact me (via the link at wrotniak.net) about licensing conditions. Shareware and freeware vendors are free to include Statmaster in their collections as long as the whole original archive, including this document, is in the package. Updates (new features, bug fixes) are available at http://www.wrotniak.net/works/statmaster/index.html, where you can also find my email address. Although the program is supplied on the "as is" basis, I will be glad to receive feedback: problem reports and enhancement suggestions. On the other hand, I cannot offer you any help in data reduction or statistics. |
The Fine Print: Although the author has taken more than reasonable care to assure that the program works exactly as documented above and does not cause any damage, he assumes no responsibility, express or implied, for any results of use, misuse, or inability to use this software. |
Document last updated 2007/04/18 | Copyright © 1998-2007 by J. Andrzej Wrotniak |