Nominal data 1

Download Report

Transcript Nominal data 1

BIOSTATISTICS (BST 211)
Sumukh Deshpande
Lecturer
College of Applied Medical Sciences
n
Lecture 1
Statistics = Skills for life.
Course Overview
Wide range of statistical
topics, without too much
details of the maths. You
should be a GOOD user
of stats tables and
techniques.
You will learn how to use
calculators and
spreadsheets to perform
statistical analysis.
Course Contents in Brief & Outcomes
 Data
Handling: Types of
Data, Displaying Data,
Descriptive Stats: Average
& Spread, Common
Distributions
 Sampling: How to chose a
sample? Confidence
Intervals
 Hypothesis
Testing: Null
and Alternative Hypothesis,
Errors in Hypothesis Testing

Basic Techniques:
Numerical & Categorical
Data, Regression &
Correlation,
 Practice: Assignments and
practical applications
Textbook
Please BUY
YOUR
PERSONAL
COPY.
One single
copy is
available at
the Medical
Library
Medical Statistics at a
Glance, 3rd Edition
Aviva Petrie (University of London Eastman
Dental Institute and London School of
Hygiene and Tropical Medicine ),
Caroline Sabin (University College London
Medical School )
ISBN: 978-1-4051-8051-1
Paperback
180 pages
July 2009, ©2009, Wiley-Blackwell
Additional Textbook



Practical Statistics for Medical
Research (Chapman &
Hall/CRC)
Douglas G. Altman
ISBN-10: 0412276305 |
ISBN-13: 978-0412276309
Calculator
Please BUY
YOUR
PERSONAL
CALCULATOR
CASIO fx ms 85
How Can You help Yourself?
Statistics is PRACTICE
You must take notes and ASK
QUESTIONS!
 Always bring your calculator
and repeat class practice.
 Try further exercises and see me
to discuss your work
 http://www.medstatsaag.com/

Important: We will NOT
email or give handouts.
These, and additional
learning material, will be
posted on the course
SkyDrive site. You must visit
the site regularly. You are
also encouraged to make
your own notes and refer to
textbooks.
Check your uoh email
You must check that your university email is working. Your uoh email is:
[email protected] where nnnnnnnnn is your student number
Go to
http://uohapp.uoh.edu.sa/eserv/e-portal/e-portal.html
To check your email is working
Then go to SkyDrive to login and/or register
skydrive.live.com
Logon using your full uoh email and password
UOH ITC Screen
http://uohapp.uoh.edu.sa/eserv/e-portal/e-portal.html

Click on Student Login

Follow the instructions
For help contact ITC
 Or call User Services
Unit(USU) on 065358352, 06-5358351

SkyDrive Screen



Login using your
uoh email and
password.
If you have any
problem try
changing your
password and
login again
For help contact
ITC or USU
BIOSTATISTICS
LECTURE 1
Types of Data &
Descriptive Statistics
2 branches of STATISTICS
Statistics
Descriptive
Inferential
Data known
Direct measure
Small groups
Data unknown
Indirect ESTIMATE
Large populations
Prediction & Decision
Keywords





Population
Sample
Variable
Data
Frequency
Nominal: Blood group,
Categorical City, marital status,..
 Ordinal:
strong/moderate/mild
 Interval: Temperature
Numerical  Ratio: Height, Weight

Nominal data 1
• Nominal or categorical data is data that comprises of categories that
cannot be rank ordered – values fall into unordered categories or classes.
• The categories available cannot be placed in any order and no judgement
can be made about the relative size or distance from one category to another.
• What does this mean? No mathematical operations can be performed on
the data relative to each other.
•Therefore, nominal data reflect qualitative differences rather than
quantitative ones.
Nominal data 2
Examples:
What is your gender? (please
Marital Status ? (please tick)
tick)
Male
Married
Female
Unmarried
Ordinal data 1
• Ordinal data is data that comprises of categories that can be
rank ordered.
• Similarly with nominal data the distance between each
category cannot be calculated but the categories can be
ranked above or below each other.
• What does this mean? Can make statistical judgements and
perform limited maths.
Ordinal data 2
Example:
Level of injury? (please tick)
Classification of patient performance status? (please
tick)
Fatal injury
Patient fully active
Severe injury
Moderate injury
Patient restricted in physically strenuous
activity
Minor injury
Patient ambulatory and capable of self-care
Patient of limited self-care
Patient disabled
Interval and ratio data
• Both interval and ratio data are examples of scale data.
• Scale data:
• data is in numeric format (SR 50, 100 ml, 15.2 cm)
• data that can be measured on a continuous scale
• the distance between each can be observed and as a result measured
• the data can be placed in rank order.
Interval data
• Interval data measured on a continuous scale and has no true
zero point.
• Examples:
•Time – moves along a continuous measure or seconds,
minutes and so on and is without a zero point of time.
• Temperature – moves along a continuous measure of
degrees and is without a true zero.
Ratio data
• Ratio data measured on a continuous scale and does have a
true zero point.
• Examples:
• Age
• Weight
• Height
Types of Data
Variables
Categorical
Nominal
Ordinal
Numerical
Discrete
(counting)
Continuous
(measuring)
Interval & Ratio
Give 2 examples for Each type of Data




……
……
……
……




……
……
……
……
Descriptive Statistics


Descriptive statistics is the term given to the analysis of
data that helps describe, show or summarize data in a
meaningful way such that, for example, patterns might
emerge from the data.
2 types of statistics used to describe data:
Measures of central tendency
 Measures of spread

Descriptive Statistics

Measures of Central
Tendency (Centrality):
 Mean
 Median
 Mode
 Percentiles

Measures of spread
(Variability):
 Variance
 Standard Deviation
 Standard Error
 Range
Centrality

Mean
Sum of observed values
divided by number of
observations
 Most common measure of
centrality
 Most informative when
data follow normal
distribution (bell-shaped
curve)

Median
“middle” value: half of
all observed values are
smaller, half are larger
 Best centrality measure
when data are skewed



Mode

Most frequently
observed value
Centrality- Median & Mode

Median


“middle” value: half of
all observed values are
smaller, half are larger
Best centrality measure
when data are skewed

Mode

Most frequently observed
value

If one mode onlyUNIMODAL

More than one modeMULTIMODAL
Centrality – Mean

Mean
Sum of observed values
divided by number of
observations
 Most common measure of
centrality
 Most informative when
data follow normal
distribution (bell-shaped
curve)


Aka AVERAGE
Practice 1 on Centrality



Group 1 data:
1,1,1,2,3,3,5,8,20
Work out the mean,
median and mode?
Mean: 4.9
Mode: 1
Median: 3



Group 2 data:
1,1,1,2,3,3,5,8,10
Work out the mean, median
and mode?
Mean: 3.8
Mode: 1
Median: 3
Variability – Range

Range is defined
simply as the
difference between
the maximum and
minimum observations
Range = Max – min
Compute the Range for the
values in grams:
423, 367, 320, 471, 480
Range = 480 – 320 = 160 grams
Percentiles



What are percentiles ?
Suppose we arrange the
data starting from
smallest to largest
The value of x that has
1% of the observations
lying below it and 99%
lying above it is called
the first percentile
1, 2, 3, ………100
Here 2 is the first
percentile, 3 is called
second percentile and so
on
Quartiles

Values of x that
divides the ordered
set into 4 equally
sized groups i.e. 25th,
50th and 75th
percentiles are called
quartiles
Example:
5, 8, 4, 4, 6, 3, 8
Sort them in order:
3, 4, 4, 5, 6, 8, 8
Cut the list into quarters:

Quartile 1 (25th)= 4
Quartile 2 (Median or 50th) = 5
Quartile 3 (75th) = 8
Variability – Interquartile Range (IQR)

Interquartile Range is IQR = Upper Quartile – Lower Quartile
defined simply as the
difference between
Upper Quartile =
the upper quartile and
lower quartile
Lower Quartile =
Interquartile Range (IQR) practice example
Compute the IQR for the following set of data:
4, 7, 10, 20, 1, 6, 8, 11




Order data:
1, 4, 6, 7, 8, 10, 11, 20
Upper Quartile = 3 (8+1)/4 = 7th Value = 11
Lower Quartile = 8+1/4 = 2nd Value = 4
IQR = 11 – 4 = 7
Variability – SD and Variance



Most commonly used measure
to describe variability is
standard deviation (SD) Standard Deviation
SD is a function of the
squared differences of each
observation from the mean
SD is the square root of
variance
Variance
Variability - Coefficient of Variation



The coefficient of variation
relates the standard deviation
of a set of values to its mean.
It is most useful for comparing
two or more sets of data.
It is used to evaluate the
relative variation between
any two sets of observation

Practice on Group 1 Data

Group 1 data:
1,1,1,2,3,3,5,8,20
 Mean:
4.67
 What is the SD and
COV?
x
f
1
2
3
5
8
20
3
1
2
1
1
1
x-xbar
-3.67
-2.67
-1.67
0.33
3.33
15.33
(x-xbar)^2
13.47
7.13
2.79
0.11
11.09
235.01
f(x-xbar)^2
40.41
7.13
5.58
0.11
11.09
235.01
Then, (299.33/8) = 37.41
SD = 37.41 = 6.11
COV = (6.11/4.67)*100% = 130.8 %
Total
299.33
Summary



Types of Data: NOIR
Centrality: Mean,
Median, Mode
Variability: SD,
Variance, Range, IQR,
Coefficient of
Variance


Descriptive stats:
patterns in data,
outliers, chose the right
test
Inferential stats:
probability of a
conclusion, drawn from a
sample, is true,