Introduction to data handling
Download
Report
Transcript Introduction to data handling
Data Domains and
Introduction to Statistics
Chemistry 243
Instrumental methods and
what they measure
Electromagnetic
methods
Electrical methods
Instruments are translators
Convert physical or chemical properties that
we cannot directly observe into information that
we can interpret.
P
T
P0
A bc log T
P
log
P0
c
b
Sometimes multiple
translations are needed
Thermometer
Bimetallic coil converts temperature to
physical displacement
Scale converts angle of the pointer to an
observable value of meaning
adapted from C.G. Enke, The Art and
Science of Chemical Analysis, 2001.
Thermostat: Displacement
used to activate switch
http://upload.wikimedia.org/wikipedia/commons/d/d2/Bimetaal.jpg
http://upload.wikimedia.org/wikipedia/commons/2/26/Bimetal_coil_reacts_to_lighter.gif
http://static.howstuffworks.com/gif/home-thermostat-thermometer.jpg
Components in translation
Data domains
Information is
encoded and
transferred
between domains
Non-electrical
domains
Beginning and end of
a measurement
Electrical domains
Intermediate data
collection and
processing
Data domains
Initial
conversion
device
Intermediate
conversion
device
Readout
conversion
device
Often viewed on a GUI
(graphical user interface)
PMT
Resistor
Digital
voltmeter
Electrical domains
Analog signals
Magnitude of voltage, current, charge, or power
Continuous in both amplitude and time
Time-domain signals
Time relationship of signal fluctuations
(not amplitudes)
Frequency, pulse width, phase
Digital information
Data encoded in only two discrete levels
A simplification for transmission and storage of
information which can be re-combined with great
accuracy and precision
The heart of modern electronics
Digital and analog signals
Analog signals
Magnitude of voltage, current, charge, or power
Continuous in both amplitude and time
Digital information
Data encoded in only discrete levels
Analog to digital to conversion
Limited by bit resolution of ADC
4-bit card has 24 = 16 discrete binary levels
8-bit card has 28 = 256 discrete binary levels
32-bit card has 232 = 4,294,967,296 discrete binary levels
Common today
Maximum resolution comes from full use of ADC
voltage range.
Trade-offs
More bits is usually slower
More expensive
K.A. Rubinson, J.F. Rubinson, Contemporary Instrumental Analysis, 2000.
Byte prefixes
About 1000
About a million
About a billion
Serial and parallel binary
encoding
Slow – not digital; outdated
(serial)
Fast – between instruments
“serial-coded binary” data
Binary Parallel:
Very Fast – within an instrument
“parallel digital” data
Introductory statistics
Statistical handling of data is incredibly
important because it gives it significance.
The ability or inability to definitively state that
two values are statistically different has
profound ramifications in data interpretation.
Measurements are not absolute and robust
methods for establishing run-to-run
reproducibility and instrument-to-instrument
variability are essential.
Introductory statistics:
Mean, median, and mode
Population mean (m): average value of replicate data
N
x
i
m lim
N
i 1
N
x1 x2 x3 ...xN
N
Median (m½): ½ of the observations are greater; ½ are
less
Mode (mmd): most probable value
For a symmetrical distribution:
m1/ 2 mmd m
Real distributions are rarely perfectly symmetrical
Statistical distribution
Often follows a Gaussian functional form
Introductory statistics:
Standard deviation and variance
Standard deviation (s):
N
s
2
x
m
i
lim
i 1
N
N
Variance (s2):
N
s 2 lim
N
xi m
i 1
N
2
Gaussian distribution
Common distribution with well-defined stats
y
68.3% of data is within 1s of mean
95.5% at 2s
99.7% at 3s
1
s 2
x m
e
2s 2
2
Statistical distribution
50 Abs measurements of an identical sample
Let’s go to Excel
Table a1-1,
Skoog
But no one has
an infinite data set …
N
x
i
x
i 1
N
N
s
2
x
x
i
i 1
N 1
N
x x
i
s2
i 1
N 1
2
Standard deviation and
variance, continued
s is a measure of precision (magnitude of
indeterminate error)
2
s total
s12 s 22 s 32 ...s n2
Other useful definitions:
Standard error of mean
sm
s
N
Confidence intervals
In most situations m cannot be determined
Would require infinite number of measurements
Statistically we can establish confidence interval
around x in which m is expected to lie with a
certain level of probability.
Calculating confidence
intervals
We cannot absolutely
determine s, so when s is
not a good estimate (small
# of samples) use:
Note that t approaches z as
N increases.
2-sided t values
Example of confidence interval
determination for smaller number of
samples
Given the following values for
serum carcinoembryonic acid
(CEA) measurements,
determine the 95% confidence
interval.
or
16.9 ng/mL, 12.7 ng/mL,
15.3 ng/mL, 17.2 ng/mL
Sample mean = 15.525 ng/mL
s = 2.059733 ng/mL
Answer: 15.525 ± 2.863, but when you consider sig figs
you get: 16 ± 3
Propagation of errors
How do errors at each
set contribute to the
final result?
x f p, q, r...
dxi f dpi , dqi , dri ...
x
x
x
dx dp dq dr ...
r v
p v
q v
s x2
x 2 x 2 x 2
s p sq sr ...
r
p
q