Synapse and Cranium use various All Data dialogs to record data values having non-active status values. Often, the designation of a datum as passive or rejected is based on comparing its value with those of other data. The Statistics Dialog provides you with several summary statistics that often help you decide upon the status of a new datum.
The Statistics Dialog is activated by using the Statics button found on All Data dialogs such as the Datum-Reference All Data Dialog and the Datum-Units-Reference All Data Dialog.
Pressing the All Data dialog's statistics button activates the Statistic dialog. The Statistics dialog collects the active and passive data values, calculates summary statistics, and then displays these values.
1
|
Data Type Control: displays a general description of the type of data being analyzed. |
2
|
Count Control: displays the number of data values being analyzed. The dialog automatically collects all active and passive data values. |
3
|
Minimum Control: displays the minimum value of the compiled data. |
4
|
Average Control: displays the unweighted average of the compiled data. |
5
|
Maximum Control: displays the maximum value of the compiled data. |
6
|
Std Dev Control: displays the standard deviation of the compile data. |
7
|
Range Control: displays the range, i.e., the maximum value minus the minimum value, of the compiled data. |
8
|
Level Control: displays the confidence level used to calculate the lower and upper confidence limits. |
9
|
Lower Control: displays the lower confidence limit. |
10
|
Upper Control: displays the upper confidence limit. |
11
|
Outliers Control: displays the indices of those data whose value is below the lower confidence limit or above the upper confidence limit. |
The Statistics Dialog uses a t-distribution to calculate the confidence limits around the mean of the compiled data. These limits are calculated according to the following equation:
limit = avg ± factor * stdDev / sqrt(nobs)
In the previous equation, limit is either the lower or upper confidence limit, avg is the average of the compiled values, factor is the t-distribution value, stdDev is the data's standard deviation, and nobs is the number of data values.
Note that the dialog uses the mean value and the confidence limits level to calculate an lower and upper confidence limit. These limits are outside the minimum and maximum values of the compiled data. Thus, no data values are marked as outliers.
Using these new limits, the dialog determined that datum number 5, the new value we just entered, is smaller than the lower limit. It thus marked this value as an outlier.
Although the dialog used a statistical analysis to identify an outlier, you should additional factors, such as the difficulty of measuring the current property for the current chemical, before you decide to reject a data value.
It is also recommended that you reject one datum at a time. Once you reject a datum, the data set's count, average, and standard deviation will change which will change the confidence limits.
Topic | Description |
---|---|
Getting Started using Synapse | provides a quick tour of Synapse's capabilities including examples of chemical product design. |
Getting Started using Cranium | provides a quick tour of Cranium's capabilities including a discussion of structure editing. |
Estimating Chemical Properties | a short video demonstrating how to estimate the physical properties of chemicals using either Synapse or Cranium. |
Estimating Mixture Properties | a short video demonstrating how to estimate the physical properties of mixtures using either Synapse or Cranium. |