Transcript Itec 3220

ITEC6310
Research Methods in Information
Technology
Instructor: Prof. Z. Yang
Course Website:
http://people.math.yorku.ca/~zyang/ite
c6310.htm
Office: Tel 3049
Organizing Data
• Graph
– Helps to make sense of your data by representing
them visually
– A basis graph represents the data in twodimensional space.
– Represent levels of your independent variable
along the x-axis and value of the dependent
variable along the y-axis.
– The importance of graphing data
• Showing relationships clearly
• Choosing appropriate statistics
• Table
2
Graphing Your Data
• Bar Graph
– Presents data as bars extending from the axis
representing the independent variable
– Length of each bar determined by value of the
dependent variable
– Width of each bar has no meaning
– Can be used to represent data from single-factor
and two-factor designs
– Best if independent variable is categorical
3
Example Bar Graph
4
Line Graph
• Data represented by a series of points connected by a
line
• Most appropriate for quantitative independent variables
• Used to display functional relationships
• Line graphs can show different shapes
– Positively accelerated: Curve starts flat and becomes
progressively steeper as it moves along x-axis
– Negatively accelerated: Curve is steep at first and then
“levels off” as it moves along x-axis
• Once the curve levels off it is said to be asymptotic
• A line graph can vary in complexity
• A monotonic function represents a uniformly increasing or
decreasing function
• A nonmonotonic function has reversals in direction
5
Example Line Graph
6
Scatterplot
• Used to represent data from two
dependent variables
• The value of one dependent variable is
represented on the x-axis and the value of
the other on the y-axis
7
Example Scatterplot
8
Pie Graph
• Used to represent proportions or percentages
• Two types
– Standard pie graph
• Exploded pie graph
9
Example
• A researcher observes driving behaviour
on a roadway, noting the gender of the
drivers, the types of vehicle driven, and
the speed at which they are traveling.
The researcher wants to organize the
data in graphs. Which type of graph
should be used to describe each
variable?
10
The Frequency Distribution
• Represents a set of mutually exclusive categories
into which actual values are classified
• Can take the form of a table or a graph
• Graphically, a frequency distribution is shown on
a histogram
– A bar graph on which the bars touch
– The y-axis represents a frequency count of the number
of observations falling into a category
– Categories represented on the x-axis
11
Example of a Histogram
12
Example
The following data represent a distribution
of speed at which individuals were traveling
on a highway.
64, 80,64,70,76,79,67,72,
65,73,68,65,67,65,70,62,67,68,65,54
Organize the above data into a frequency
distribution table and draw the histogram
for these data.
13
Shapes of Histograms
• You should examine your frequency
distribution to determine its shape.
– Normal distribution: Most scores centered around
the mean
– Positive skew: Most scores at the lower end of the
measurement scale
– Negative skew: Most scores at the higher end of
the measurement scale
– Bimodal distribution: Two modes
14
Histogram Showing a Positive Skew
15
Histogram Showing a Negative Skew
16
A Bimodal Distribution
17
Descriptive Statistics
• Measures of Center
– Gives you a single score that represents the
general magnitude of scores in a
distribution.
– Three measures
•Mode, median and mean
• Measures of Spread
– Measure of variability
18
Measures of Center
• Mode
–
–
–
–
Most frequent score in a distribution
Simplest measure of center
Scores other than the most frequent not considered
Limited application and value
• Median
–
–
–
–
Central score in an ordered distribution
More information taken into account than with the mode
Relatively insensitive to outliers
Used primarily when the mean cannot be used
19
Measures of Center
• Mean
– Average of all scores in a distribution
– Value dependent on each score in a distribution
– Most widely used and informative measure of
center
20
Choosing a Measure of Center
• Mode
– Used if data are measured along a nominal scale
• Median
– Used if data are measured along an ordinal scale
– Used if interval data do not meet requirements for
using the mean
• Mean
– Used if data are measured along an interval or
ratio scale
– Most sensitive measure of center
– Used if scores are normally distributed
21
Example
•In the example on Slide 10, a researcher
collected data on driver’s gender, type of vehicle,
and speed of travel. What is an appropriate
measure of central tendency to calculate for
each type of data?
• In the example on Slide 13, if one driver was
traveling at 100mph (25 mph faster than anyone
else), which measure of central tendency would
you recommend against using?
22
Example
Calculate the mean, median, and mode for
the data set on Slide 13. Is the distribution
normal or skewed? If it is skewed, what
type of skew is it? Which measure of
central tendency is most appropriate for
this distribution and why?
23
Measures of Spread
• Range
– Subtract the lowest from the highest score in a
distribution of scores
– Simplest and least informative measure of spread
– Scores between extremes are not taken into account
– Very sensitive to extreme scores
• Interquartile Range
– Less sensitive than the range to extreme scores
– Used when you want a simple, rough estimate of
spread
24
Measures of Spread
• Variance
– Average squared deviation of scores from the mean
– Not expressed in same units as original numbers
• Standard Deviation
– Square root of the variance
– Expressed in the same units as original numbers
– Most widely used measure of spread
25
Measures of Spread: Applications
• The range and standard deviation are
sensitive to extreme scores
– In such cases, the Interquartile range is best
• When your distribution of scores is
skewed, the standard deviation does not
provide a good index of spread
• With a skewed distribution, use the
Interquartile range
26
Example
• For a distribution of scores, what information does a
measure of variation add that a measure of central
tendency does not convey?
• Today’s weather report included information on the
normal rainfall for this time of year. The amount of
rain that fell today was 1.5 inches above normal. To
decide whether this is an abnormally high amount of
rain, you need to know that the standard deviation for
rainfall is 0.75 of an inch. What would you conclude
about how normal the amount of rainfall was today?
What your conclusion be different if the standard
deviation were 2 inches?
27
Example
Calculate the range, variance, and
standard deviation for the following three
distributions:
1, 2, 3, 4, 5, 6, 7, 8,9
-4,-3,-2,-1,0,1,2,3,4
10,20,30,40,50,60,70,80,90
28
The Five Number Summary
• Five Number Summary
– Convenient way to represent a distribution with a few
numbers
– Statistics included
• Minimum score
• The first quartile
• The median (second quartile)
• Third quartile
• Maximum score
29
Box Plots
• Graphic representation of the five number
summary
• First and third quartile define the ends of the box
• A line in the box represents the median
• Vertical “whiskers” extending above and below
the box represent the maximum and minimum
scores (respectively)
• Data from multiple treatments are represented by
side-by-side boxplots
30
Example of a Boxplot (left) and a
Side-By-Side Boxplot (right)
31
Using SPSS at home
• Go to the website below:
http://www.yorku.ca/computing/students
/labs/webfas/
• Click “Login” button to login to WebFAS
with your Passport York login
• Choose “SPSS 21” from the software list
32