Introduction to Statistics - The Catholic University of America

Download Report

Transcript Introduction to Statistics - The Catholic University of America

ENGR 104: Lecture 2
Statistical Analysis Using Matlab
Lecturers:
Dr. Binh Tran
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Definitions

Statistics: Science that deals with collection,
tabulation, analysis, and interpretation of data
(qualitative or quantitative) in order to make
objective decisions and solve problems.
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Statistical Measures of Data



Average/(Arithmetic) Mean: The average value
of all observations
Median: Middle observation
Mode: Value where highest number of observations
occurs

Range: Difference between max and min values (rough
measure of data dispersion)

Standard Deviation: Special form of average
deviation from the Mean
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Average/(Arithmetic) Mean
n



Mean:
X 
X
i
1
n
Advantage: Easy to
compute
Disadvantage: Distorted
by extreme values
(outliers)
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Median: Middle Observation

Definition: Median value is
middle item when items are
arranged according to size

Advantage: Not distorted by
outliers
Disadvantage:Must be
rearranged according to size

© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Mode & Range

Mode: Most common value occurring in set of data



Advantage: Most typical value and independent of the
extreme items
Disadvantage: If values are not repeated and amount of
data is small, then the significance of the mode is limited
Range: Difference between min/max values in series


Advantage: Easy to compute & simplest measure of
dispersion
Disadvantage: No info regarding distribution of data
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Standard Deviation

Definition:
 X
n
 
1=
68.3%

2=
95.5%

i
X

2
1
n
Advantage: Show the
degree of dispersion and
variability
Disadvantage: Not trivial
to compute
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Presentation of Data


Frequency Plot: Histogram of # of occurrences.
Curve Fitting: Polynomial fitting of experimental
data

Time Series Analysis or Trend Plots::
– Analysis of trends in data
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Frequency Plot or Histogram

Definition: Graphic
representation of
frequency distribution

Advantage: Quick
visualization of data
Disadvantage: Difficult to
analyze data, unless data is
grouped systematically

© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Polynomial Curve Fitting


Best fit curve for data
Polynomial Equation:
y  a xm  a xm 1 
0
1
   a
xa
m 1
m


Advantage: Large set of data
can be represented by a known
equation
Disadvantage: m>2, process
becomes very laborious
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Ex:Polynomial Curve Fitting

Example:
y  a x 2  a x1   a
0
1
2

Where,
a  0.0155
0
a  2.1411
1
a  58.4165
2
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Time Series (Trend) Analysis
 Definition: Graphic
representation consisting of
description & measurement of
various changes or movements of
data during a period of time.
 Types of trend measurement
• Semi-average
• Moving average
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Semi-Average

Definition: Split data set
into two equal parts; take
average; draw straight line
through two average points

Advantage: Very simple to
calculate
Disadvantage: Only gross
representation of data trends

© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Moving Average

Definition: A series of
successive group averages

Advantage: Simple to calculate;
more accurate representation of
local changes

Disadvantage: Cannot be
brought up to date
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Data Presentation:
Ex: Three-Item Moving Average
Values Total Moving Average
3
5
15
5.00
7
22
7.33
10
29
9.67
12
36
12.00
14
41
13.67
15
46
15.33
17
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Questions ?
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
Lab #2: Telemedicine Analysis


Lab Report Due: 9/29
Download Telemedicine data for 6
study subjects (txt files)
– http://faculty.cua.edu/tran/engr104/Datafiles.htm


Using Matlab, statistically analyze the
data and report your observations
See handout
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering
LAB QUESTIONS:





Is there a noticeable trend/pattern in the data? Across the
datasets?
Is there a correlation between the blood glucose and high blood
pressure measure over time?
Examine this using a time-series analysis (30-day epochs).
Explain your findings.
Use curve fitting techniques to estimate the regression line best
fitting the data for each subject.
Is there a difference between the effects of tele-monitoring on
diabetics vs. hypertensives (i.e. those with high blood pressure)?
Explain.
– Is there any useful information in the histogram?
© 2003-09 The Catholic University of America
Dept of Biomedical Engineering