Transcript PPT

Part I, Chapters 4 & 5
Data Tables and Data Analysis
Statistics and Figures
Descriptive Statistics 1
Are data points clumped?
(order variable / exp. variable)
• Concentrated around one value?
• Concentrated in several areas?
Do data point pairs show a pattern?
(exp. variable #1 / exp. variable #2)
• Straight line?
• Parabola?
• Sin function?
Scatter Graphs
A scatterplot is a useful summary of a set of bivariate
data (two variables), usually drawn before working
out a linear correlation coefficient or fitting a
regression line. Each unit contributes one point to the
scatterplot, on which points are plotted but not
joined. The resulting pattern indicates the type and
strength of the relationship between the two
variables.
• Gives a good visual picture of the relationship
between the variables
• Aids the interpretation of the correlation coefficient
or regression model
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html
Bar and Column Graphs
A bar / column graph is a way of summarising a set of
categorical data. It is often used in exploratory data
analysis to illustrate the major features of the
distribution of the data in a convenient form. It
displays the data using a number of rectangles, of the
same width, each of which represents a particular
category. The length (and hence area) of each
rectangle is proportional to the number of cases in
the category it represents, for example, age group,
religious affiliation.
• Summarize nominal or ordinal data
• Displayed horizontally (bars) or vertically (column)
• Drawn with a gap between the bars (rectangles)
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html
Frequency Analysis: Histograms
A histogram is a way of summarizing data that are
measured on an interval scale (either discrete or
continuous). It is often used in exploratory data
analysis to illustrate the major features of the
distribution of the data in a convenient form. It
divides up the range of possible values in a data set
into classes or groups. For each group, a rectangle is
constructed with a base length equal to the range of
values in that specific group, and an area proportional
to the number of observations falling into that group.
This means that the rectangles might be drawn of
non-uniform height.
• Variables are numerical
• Variables are measured on an interval scale
• Used with large data sets (>100 observations)
• Detect unusual observations (outliers, gaps)
http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html
Descriptive Statistics 2
Size – Total number of Data Points, N
Does the Study
Matter?
-- Number of reactions performed
-- Numbers of points measured (i.e., for a spectrum)
Range – Distance between smallest and largest data value
Does the Parameter
Matter?
-- Min., Max., and Range = Max. – Min.
-- Average ± Difference/2
Middle – There are many types of averages
-- Average = Mean = Arithmetic Mean: Sum(Data) / N
How Does the
Parameter Matter?
-- Geometric Mean: [ Product(Data) ]1/N
-- Mode: Most frequent data value
-- Median: N/2 data are below and above.
How Frequent
Are Deviations?
Spread – Frequency of Significant Deviation
-- Standard Deviation
-- Central 50%
Standard Deviation
Standard Deviation
Real Work Sample:
Data Histogram
Mathematical World:
Inferential statistics
maps the data by a
formula that describes
the population.
Normal or Gaussian Distribution
Bell Curve
A normal distribution in a variante x
with mean a and variance 2 is a
statistic distribution with probability
density function
2 


x  a 
1

f (x) 
exp 

2
2

2
 2 

If Gaussian, then: Mean = Median = Mode

Mean and Standard Deviation determine the distribution
http://mathworld.wolfram.com/NormalDistribution.html
Significance Test & p-Value
p-value = alpha level = significance level
For significance tests there is a hypothesized condition (called null
hypothesis, H0) that one is testing to see if it is true. For a test of fit the
hypothesized condition is that the selected distribution (i.e., a normal
distribution) generated the data.
The p-value is the probability that the data have been generated under the
hypothesized condition. A p-value of 0.05 indicates that the chance of the
observed data is low, 1 in 20, due to variation alone. This is good evidence
that the data was not generated under the hypothesized condition. The
hypothesized condition is rejected if the p-value is 0.05 or below.
A p-value of 0.05 provides 95% confidence the hypothesized condition is
not true (i.e., no normal distribution). The confidence level is calculated
from the p-value as 100*(1 – p).
Cramer von Mises Test
Andersen-Darling Test
A “quadratic EDF statistics”
EDF = empirical dist. function
n


Fn (x)  F(x) w(x) dF(x)
2
CvM : w(x)  1
AD : w(x) 

1
F(x)  F(x) 2
AD value –{Statistics}-> p-value
Normal Distribution, 1D-Gaussian
Scatter Graph In Electronics Research
Gaussian Dist. In Polymer Research
Gaussian Dist.
In Food
Chemistry
Histogram In Food Chemistry
A log-normal distribution: variable whose
logarithm is normally distributed.
Histogram In
Nano Chemistry
PRD = Particle Radius Distribution
A log-normal distribution: variable
whose logarithm is normally
distributed.
Histogram In Polymer Research
Histogram In Solid State Research
How to Compute Descriptive Statistics,
Present Scatter Graphs & Histograms, and
Plot Functions using Excel
Example in Lecture: Test Scores
Assign. #3: Handout & online.
Create a Bar Graph 1
Create a Bar Graph 2
Create a Bar Graph 3
Create a Column Graph 1
Create a Column Graph 2
Create a Scatter Graph 1
Create a Scatter Graph 2
Stats 1
Stats 2
Histogram: Load Analysis 1
Histogram: Load Analysis 2
Histogram: Load Analysis 3
Histogram: Load Analysis 4
Histogram: Load Analysis 5
Histogram: Load Analysis 6
Distribution Functions
Gaussian, Normal Distribution
Poisson Distribution
Binominal Distribution
Pascal Distribution (Neg. Binom. Dist.)
Normal or Gaussian Distribution
Bell Curve
A normal distribution in a variante x
with mean a and variance 2 is a
statistic distribution with probability
density function
2 


x  a 
1

f (x) 
exp 

2
2

2
 2 

If Gaussian, then: Mean = Median = Mode

Mean and Standard Deviation determine the distribution
http://mathworld.wolfram.com/NormalDistribution.html
Poisson Distribution
Frequency
N!
Pp n N  
p n (1 p) N n
n!(N  n)!
P  lim Pp n N  
N 
Outcome Bins

http://mathworld.wolfram.com/PoissonDistribution.html
 n e
n!
Poisson Distribution
http://demonstrations.wolfram.com/PoissonDistribution/
Poisson Dist. in Single Photon Detection
The Poisson distribution can be used to describe how rare, random events are distributed in time or space and
applies to measurements in which the number of observations is large but the number of events is very small.
In chemical measurements Poisson statistics apply to the extraordinarily small signals commonly found in
ultrasensitive analytical instrumentation: for example, single-photon counting fluorescence detection.
Harbron, E. J.; Barbara, P. F. J. Chem. Educ. 2002, 79, 211-213.
Binominal Distribution
Frequency
Pp n N  

Outcome Bins
N!
pn (1 p) Nn
n! (N  n)!
The binomial distribution gives the
discrete probability distribution P(n|N) of
obtaining exactly n successes out of N
Bernoulli trials where the result of each
Bernoulli trial is true with probability p
and false with probability q.
http://mathworld.wolfram.com/BinomialDistribution.html
Binominal Dist.: N Coin Flips
p = q = 0.5
N!
Pp n N  
0.5N
n! (N  n)!
N = 10, 20, and 40.

Binom. Dist.: N People
There are N people at Shakespeare’s and on average 75% are adults. We
are sitting outside and watch the door as all people exit. What are the
probabilities that n adolescents exit the place any given time? We watch
the door many, many times and then construct a histogram.
N!
Pp n N  
0.25n 0.75Nn
n! (N  n)!
N = 10, 20, and 40.

p = 0.25 for not being adult
Binom. Dist. In Nanochemistry
Au25(SCH2CH2Ph)18 + (HSC6H5 or HSPh)  Au25(SCH2CH2Ph)18-m(SR)m + HSCH2CH2Ph
Large excess of added thiol.
Probabilities for ligand exchange are different for each type of thiol.
For each thiol, the probability for exchange does not depend on environment in cluster.
Use mass spectrometry to analyze the exchange reaction.
Binominal Dist. In Nanochemistry
Poisson Distribution
Binominal Distribution for Large N
Binominal
Frequency
N!
Pp n N  
p n (1 p) N n
n!(N  n)!
P  lim Pp n N  
N 
 n e
n!
Poisson
Outcome Bins

The N is gone!!
We can determine N based
on the measured distribution.
http://mathworld.wolfram.com/PoissonDistribution.html
Negative Binominal Distribution
Pascal Distribution
The Pascal distribution gives the discrete
probability distribution P(x) of success in
the (r+x)th trial after having experienced
r-1 successes and x failures.
x  r 1 r
x
Pr, p x   
p
(1
p)

r
1



http://mathworld.wolfram.com/NegativeBinomialDistribution.html