Transcript Module

Chapter 10:
Planning Data Analysis
and Completing the
Design Matrix
模块10:规划数据分析,完成
成设计矩阵
Intervention
or
Policy
干预或政策
Subevaluations
Qualitative vs.
Quantitative
Qualitative
Quantitative
子评价
定性对定量
定性
定量
Introduction
导言
•
•
•
•
•
•
•
•
Data Analysis Strategy
数据分析战略
Analyzing Qualitative Data
分析定性数据
Analyzing Quantitative Data
分析定量数据
Linking Quantitative Data and Qualitative Data
将定量数据与定性数据联系起来
2
Data Collection and Analysis
花
费
的
时
间
Hours Spent
数据收集与分析
Data Analysis
数据分析
Data Collection
数据收集
Time
时间
3
Qualitative Analysis
定性分析
• Best used when for in-depth understanding of
the intervention
• 在深入理解干预时最适合使用
• Answers questions like:
• 回答诸如此类问题:
–
–
–
–
–
–
What are some of the difficulties faced by staff?
员工面临的困难有哪些?
Why do participants say they dropped out early?
为什么一些参与者很早就退出了?
What is the experience like for participants?
参与者的感受是怎样的?
4
Quantitative Analysis
定量分析
• Can be used to answer questions like?
• 可用于回答以下问题:
– What are the mean scores for the different groups of
participants?不同参与组的平均分是多少?
– How do participants rate the relevance of the
intervention on a scale of one to five?如果按1至5分进
行打分的话,参与者对干预的相关性会打多少分?
– How much variability is there in the responses to the
item? 在对问题的回答中,变率有多少?
– Are the differences between the two groups
statistically significant? 两组在数据上是否有很大差别?
5
Qualitative Data
定性数据
•
•
•
•
•
•
•
•
•
•
Description of program, process, and experiences
对计划、程序和经验的说明
To understand context of the situation
理解形势背景
To understand perceptions
理解感受
Research evolves as questions emerge
研究随着问题的出现而发展
Flexible design
可行的设计
6
Qualitative Data Analysis
定性数据分析
•
•
•
•
•
•
Used for any non-numerical data collected as part of the evaluation
用于作为评价一部分收集的非数字数据
– unstructured observations
– 未组织的考察
– open-ended interviews
– 开放式的访谈
– analysis of written documents
– 书面文件的分析
– focus groups transcripts
– 专题组讨论笔录
– diaries, observations
– 日志、观察
Analysis challenging
分析问题
Take care for accuracy (validity concern)
力求准确(对有效性的关注)
7
Making Good Notes
作记录
•
•
•
•
•
•
•
•
•
•
Capture as much information as possible
捕捉尽可能多的信息
Pay close attention to language
密切关注语言
Write down observations
记录下观察
Capture your immediate thoughts
捕捉你的直接思路
Leave time to write up notes immediately
留出时间来立即做详细书面记录
8
Triangulation
三元法
• Can use three or more sources of information to verify and
substantiate your data
• 可以使用三个或更多的信息来源,以检验和证明你的数据
• Examples:
• 例子:
–
–
–
–
–
–
–
–
interviews, focus groups, questionnaires
会谈,专题组讨论,调查问卷
questionnaires, available data, expert panels
调查问卷,可利用的数据,专家小组
observations, program records, interviews
考察,计划记录,会谈
interviews, diaries, available data
会谈,日志,可利用的数据
9
Early Steps in Qualitative Analysis (1 of 3)
定性分析的早期步骤(第1页,共3页)
• While collecting data:
• 在收集数据的同时:
–
–
–
–
–
–
–
keep good records
很好地坚持记录
write up interview, impressions, notes from focus groups
详细书面记录下专题组讨论的会谈、印象和笔记
make constant comparisons as you progress
在进展的同时不断进行比较
meet with team regularly to compare notes and make
adjustments
– 定期与小组会面,比较笔记并进行调整
10
Early Steps in Qualitative Analysis (2 of 3)
定性分析的早期步骤(第2页,共3页)
• Write contact summary report
• 撰写联系的总结报告
–
–
–
–
–
–
–
–
–
–
one page summary after each major interview or focus group
在每次重大会谈或专题组讨论之后写一页的概述
main issues
主要问题
major information obtained
获得的主要信息
what was the most interesting, illuminating, or important?
最令人注意的、最有启发意义的或最重要的是什么?
what new questions need to be explored?
需要探讨什么新问题?
11
Early Steps in Qualitative Analysis (3 of 3)
定性分析的早期步骤(第3页,共3页)
• Use tools to help you
• 利用工具帮助你
– create a subjectivity file with your own reactions
during the study, including your feelings, hunches,
and reactions
– 建立一份关于你自己在研究期间反应的主观性文件,
包括你的感受、预感和反应
– file your ideas that emerge as you proceed
– 把你在进展过程中出现的想法归档
– keep a file of quotations from the data collection
– 坚持对来自数据收集的引用进行归档
12
Maintain an Iterative Dialogue
坚持反复对话
• Share information early and often with key informants
• 早早地共享信息,并且经常是与关键的信息提供者
• Have others review early drafts with the intention of
eliciting information, questions, other ways of interpreting
data
• 让其他人审查初稿,旨在得出信息,问题,解释信息的其
它方法
13
Reading and Coding Data
阅读数据和对数据进行编码
•
•
•
•
•
•
•
•
Read all of the data carefully
仔细阅读所有数据
Come up with names or labels for topics
对各主题进行命名和标注
Describe the topics
描述主题
Using codes, classify all of the data
利用编码,对所有数据进行分类
IPDET
1414
Coding编码
• Manually coding手工编码
–
–
–
–
–
–
use colored pens, pencils, or papers
用彩色钢笔、铅笔或纸
cut with scissors to manually sort
用剪刀裁剪,以手工分类
use a number coding system
利用排号编码体系
• Computer program coding电脑程序编码
– word processors 文字处理器
– presentation software 演示软件
– databases, spreadsheets, or qualitative data analysis
1515
software 数据库、电子表格或定型数据分析软件
Drawing-out Themes and Patterns
大致说明主题与模式
•
•
•
•
As you review, begin to make notes
在你评价的同时,开始做笔记
Goal is to summarize what you have seen or heard:
目标是总结你看到的或听到的
–
–
–
–
–
–
–
–
•
•
•
•
common words
共同词语
phrases
短语
themes
主题
patterns
模式
Also identify where they are so you can find them again if you need to
verify
同样确认它们的位置,这样如果你需要进行验证,能够再次找到它们
May want to use a spreadsheet
可能需要使用电子表格
16
Content Analysis
内容分析
• Identify certain words or concepts in text
or speech
• 在文本或话语中识别某些词或概念
• Conceptual analysis: 概念分析
– look at word frequencies 注意词的频率
• Relational analysis: 关联分析
– look at word frequencies注意词的频率
– explore relationships among concepts寻找概
念间的关系
1717
Computer Help for Qualitative Data
Analysis
计算机有助于定性数据分析
• Software packages to help you organize data
• 软件包帮助你组织数据
• Search, organize, categorize, and annotate textual and
visual data
• 搜索、组织、分类和注释文本和视觉数据
• Help you visualize the relationships among data
• 帮助你将数据之间的关系直观化
18
Examples of QDA
定性数据分析的例子
•
•
•
•
•
•
NUD*IST
Ethnograph
Qualpro
Hyperqual
Anthropax
Atlas-ti
19
Controlling for Bias
控制偏见
• We tend to see what we want to see and may miss
things that do not conform to our expectations
• 我们倾向于看到我们希望看到的,可能会漏掉那些不符合
我们期望值的事物
• Use well trained recorders
• 很好地使用受过培训的记录员
• Evaluators review documents and code them in themes
• 评价者评价文件并把它们按主题编码
20
Concluding Thoughts on Qualitative
Data
定性数据的结论性思考
• Qualitative data collection is not the easy option
• 定性数据收集不是一个简单的选择方案
–
–
–
–
labor intensive and time consuming
劳动密集型的和耗时的
reliability among coders, using a coding scheme is essential
编码员之间的可靠性,必须使用编码方案
• Can reveal some of the most valuable information
• 能够揭示出一些最宝贵的信息
21
Quantitative Data: Statistics
统计
• Quantitative data are analyzed with statistics
• 利用统计分析定量数据
– descriptive statistics: used with census or nonrandom sample data
– 描述统计:与统计数字或非随机样本数据一起使用
– inferential statistics: used with random sample data
– 推论统计:与随机样本数据一起使用
22
Descriptive Statistics
描述统计
• Describes the frequency and/or percentage distribution
of a single variable
• 说明一个单独变量的频率和/或百分比分布
• Tells how many and what percent
• 说明有多少以及占多大百分比
• Example:
• 例子:
– 33% of the respondents are male and 67% are
female (table on next slide)
– 33%的被调查人是男性,67%是女性(表格见下一张幻灯片)
23
Example of Descriptive Statistics in a Table
表格中的描述统计举例
How many men and women are in the program?
Table 11.5: Distribution of Respondents by Gender
该计划有多少男性和女性参与?
表11.5:被调查人的性别分布
Male男性
Female女性
Total总
计
Source: Fabricated Data
Number Percent Number Percent Number
来源:虚构的数据
人数
百分比
人数
百分比
人数
100
33%
200
67%
300
Write up: Of the 300 people in this program, 67% are women
and 33% are men.
书面记录:在参与该计划的300人中,67%为女性,33%为男性。
24
Distributions
分布
• Measures of central tendency
• 集中趋势测量
–
–
–
–
how similar are the data?
数据在多大程度上类似?
example: How similar are the ages of this group of people?
例如:这一群人的年龄在多大程度上类似?
• Measures of dispersion
• 离中趋势测量
–
–
–
–
how dissimilar are the data?
数据在多大程度上不一样?
example: How much variation in the ages?
例如:年龄差异有多大?
25
Measures of Central Tendency
集中趋势测量
• The 3-M’s
• 3M
–
–
–
–
–
–
mode: most frequent response
众数:出现最频繁的响应值
median: mid-point of the distribution
中位数:分布的中点
mean: arithmetic average
平均数:算术平均数
• Which to use depends on the type of data you have
• 使用哪一个取决于你的数据类型
– nominal, ordinal, interval/ratio
– 名目,序数,间隔/比率
26
Nominal Data
名目数据
•
•
•
•
Data of names or categories
名称或类别数据
Examples:
例如:
– gender (male, female)
– 性别(男性,女性)
– religion (Buddhist, Christian, Jewish, Muslim)
– 宗教(佛教,基督教,犹太教,穆斯林)
– country of origin (Burma, China, Ethiopia, Peru)
– 原产国(缅甸,中国,埃塞俄比亚,秘鲁)
• Use mode as a measure of central tendency
• 使用众数作为集中趋势的度量
27
Ordinal Data
序数数据
• Data that has an order to it but the “distance” between consecutive
responses is not necessarily the same
• 有一定顺序但是连续响应值之间的“距离”不一定相同的数据
• Lacks a zero point
• 缺少零点
• Examples:
• 例子
– opinion scales that go from “most important” to “least important” or “strongly
agree” to “strongly disagree”
– 从“最重要”到“最不重要”或者从“强烈同意”到“强烈不同意”的评价尺度
• Use mode or median as a measure of central tendency
• 使用众数或中位数作为集中趋势的度量
28
Interval/Ratio Data
间隔/比率数据
•
•
•
•
•
•
Data of real numbers, numbers with a zero point and can be divided
and compared into other ratio numbers
实数数据,数字有零点并且可以被分割或比较成其它比率数字
Examples:
例子:
– age, income, weight, height
– 年龄,收入,体重,身高
Use mode, median, or mean as a measure of central tendency — the
choice depends on the distribution
使用众数、中位数或平均数作为集中趋势的度量——选择取决于分布
– for normal data, mean is best
– 对于标准数据来说,平均数最适合
– for data with few high – or - few low scores, median is best
– 对于只有少数很高或少数很低值的数据来说,中位数最适合
29
Calculating
计算
•
•
•
•
•
•
Mode: the one with the most
众数:出现最频繁的值
Median: place in order then count down to half way
中位数:按顺序放置,然后数到中间
Mean: (most people think of it as the average)
平均数:(大多数人认为它是平均值)
30
Example Data
数据例子
Table 11.7: Sample Data
表11.7:样本数据
Country
国家
% Urban
城市
Bolivia玻利维亚
65
Algeria阿尔及利亚
60
Central Africa Republic中非
共和国
41
Georgia格鲁吉亚
61
Panama巴拿马
58
Turkey土耳其
75
Source: Fabricated Data,
资料来源:虚构的数据
31
Example Calculations for % Urban
Data
计算城市百分比数据的例子
• Mode: no mode, all have only one
• 众数:没有众数,所有都只出现过一次
• Median: total entries is 6, with data in order two middle scores are
(61 and 60) ÷ 2 = 60.5
• 中位数:共有6条,按顺序放置数据,中间两个值是(61和60) ÷ 2
= 60.5
• Mean:
• 平均数:
(65+60+41+61+58+75) ÷6 = 60
(65+60+41+61+58+75) ÷6 = 6
32
Measures of Dispersion
离中趋势测量
• Range
• 值域
–
–
–
–
difference between the highest and lowest value
最高值与最低值之间的差
simple to calculate, but not very valuable
计算简单,但是不太有价值
• Standard deviation
• 标准差
–
–
–
–
measure of the spread of the scores around the mean
数值围绕平均数分布情况的度量
superior measure, it allows every case to have an impact on its value
优秀的度量,它让每个案例都对它的值产生影响
33
Example Calculation for Range
计算值域的例子
• Range: high score – low score = range
• 值域:最高值—最低值=值域
range = 75 – 41
值域=75—41
range = 34
值域=34
34
Normal Curve (Bell)
正常曲线(钟形)
Frequency
y
频率
0
Value
值
x
35
Standard Deviation
标准差
y
Mean
平均
数
One standard deviation
from the mean
与平均数相隔一个标准差
Two standard deviations
from the mean
与平均数相隔两个标准差
x
0
68%
95%
Three standard
deviations from the mean
与平均数相隔三个标准差
98%
36
Calculating Standard Deviation
计算标准差
•
•
•
•
Calculating is time consuming
计算非常耗时
Can use statistical programs:
可以使用统计程序:
– SPSS
– SPSS
– Excel or other spreadsheet program
– Excel或其它电子表格程序
37
Guidelines for Analyzing Quantitative
Survey Results
分析定量调查结果的指导原则
1 Choose a standard way to analyze the data and apply it consistently
选择一个分析数据的标准方法并始终如一地应用它
2 Do not combine the middle category with each side of the scale
不要把尺度各边与中间类别混合起来
3 Do not report an agree or disagree category without also reporting the strongly agree
agree or strongly disagree category
不要在不报告强烈同意、同意或强烈不同意类别的情况下报告同意或不同意类别
4 Analyze and report percentages (or numbers)
分析并报告百分比(或数字)
5 Provide the number of respondents for an anchor
提供被调查人数量,用作支撑点
6 If there is little difference in the data, raise the benchmark
如果数据几乎没有差异,提出基准点
7 Like any art or skill, it gets easier with training and practice
和任何艺术或技能一样,通过培训和练习,它会越来越容易
38
Common Descriptive Statistics
共同描述统计
•
•
•
•
•
•
•
•
•
•
Frequencies
频率
Percent
百分比
Mean
平均数
Median
中间数
Mode
众数
•
•
•
•
•
•
•
•
Money
金钱
Percent
百分比
Ratio
比率
Comparisons
比较
39
Describing Two Variables at the
Same Time
同时描述两个变量
• Two variables at once
• 一次两个变量
• Example: What percent were boys and what percent
were girls in hands-on and traditional classes?
• 例如:在实践班和传统班里,男生和女生各占多少百
分比?
40
Example Two Variables at the
Same Time
同时描述两个变量的例子
Hands-on Hands-on Traditional Traditional
实践
实践
传统
传统
Boys男孩
28
55%
34
45%
Girls女孩
22
45%
41
55%
Total 125
总数 125
N=50
100%
N=75
100%
Source: Fabricated Data: 2004 Survey
资料来源:虚构数据:2004年调查
41
Two Variables with Crosstabs
交叉表的两个变量
• Cross tabulation (crosstab)
• 交叉表
– presented as a contingency table in a matrix format
– 在矩阵格式里作为列联表显示
– displays two or more variables simultaneously
– 同时显示两个或更多个变量
– each cell shows number of respondents
– 各个单元显示出被调查人的数量
42
Example Crosstabs
交叉表的例子
Boys
男孩
Girls
女孩
Hands-on
实践
45%
Traditional
传统
55%
Total %
总计 %
100%
35%
65%
100%
N=125 Source: Fabricated Data
N=125 资料来源:虚构的数据
43
Variables
变量
• Independent
• 自变量
–
–
–
–
Variable which you believe explains a change in the dependent variable
你认为说明因变量变化的变量
Program evaluation: the program
程序评价:程序
• Dependent
• 因变量
–
–
–
–
Variable you want to explain
你希望说明的变量
Program evaluation: the outcomes
程序评价:结果
44
Example: Comparison of Means
例子:平均数的比较
-dependent variable: annual income
-因变量:年收入
-independent variable: gender
-自变量:性别
Mean Income
平均收入
Women
女性
27,800 SA Rand
27,800南非兰特
Men
男性
32,400 SA Rand
32,400南非兰特
45
Measure of Relationship
关系的度量
•
•
•
•
How strongly variables are related, reported differently
变量之间的关系有多强,不同的报告
Measures of association
相联度量
– range from zero to 1
– 值域从0到1
• Measures of correlation
• 相关性度量
– range from –1 to +1
– 值域从-1到+1
46
Interpretation of Correlation
对相关性的解释
• Measures of correlation:
• 相关性的度量
– perfect relationship: 1 or –1
– 完美关系:1或-1
•
•
•
•
closer to 1 or –1: strong relationship
更接近于1或-1:强有力的关系
.5: moderate/strong (maybe as good as it gets)
.5:适中/强烈(可能更高)
– closer to zero: no relationship
– 更接近0:没有关系
• .2 - slight/weak relationship
• .2——轻微/微弱的关系
47
Direct Relationship
直接相关
• Plus sign +
• 加号+
– both variables change in the same direction
– 两个变量都向着同一个方向变化
– example:
– 例子
• as driving speed increases, death rate goes up
• 随着车速的增加,死亡率呈上升趋势
48
Inverse Relationship
逆相关
• Minus sign • 减号—
– both variable change but in the opposite direction
– 两个变量向着相反的方向变化
– example:
– 例子:
• as age increases, health status decreases
• 随着年龄的增长,健康状况呈下降趋势
49
Inferential Statistics
推论统计
• Used to analyze data from randomly selected samples
• 用来分析随机选择的样本数据
• Risk of error because your sample may be different from
the population as a whole
• 因为你的样本可能与总体人群不同,存在着误差风险
• To make an inference, you first need to estimate the
probability of that error
• 为得出推论,你首先需要估计这一误差的概率
50
Statistical Significance Tests
统计显著性检测
•
•
•
•
Tools to estimate how likely the results are in error
估计结果出现误差可能性的工具
Called tests of statistical significance
称为统计显著性检测
– to estimate how likely it is that you have gotten the
results you see in you analysis by chance alone
– 估计你纯属偶然得出你的分析结果的可能性
51
Statistical Significance
统计显著性
• Benchmark of .5%
• .5%为基准点
– .05 Alpha level or P value
– .05 Alpha 水平或P值
• It means we are 95% certain that our sample results are not due to
chance
• 它意味着我们95%地肯定我们的样本结果不是偶然得出的
– Or
– 或者
•
•
•
•
The results are statistically significant at the .05 level
结果大于.05则具备统计显著性
Most reports do not go beyond .5
大多数报告不超过.5
52
Chi Square and t-Test
Chi Square
卡方和t检验
t-Test
T检验
卡方
• One of the most popular
statistics
• 最流行的统计法之一
– easy to calculate and interpret
– 易于计算和解释
• Used to compare two sets of
nominal data (i.e marital status and
•
religious affiliation)
用来比较两组名目数据(即,婚姻状况
和宗教关系)
• Used to compare two ordinal
variables or a combination of
nominal and ordinal variables
• 用来比较两个序数变量或名目
变量与序数变量的结合
•
•
Used to determine if one group of
numerical scores is statistically
higher or lower than another group
of scores
用来决定一组数值是否在统计学上
比另一组数值更高或更低
– two means
– 两个平均数
– post-test only, two group
randomized design
– 仅进行事后检验,两组随机设计
53
Hypothesis Testing
假设检验
• Research hypothesis is your best guess as to the relationship
between variables
• 研究假设是你关于变量关系的最佳预测
– Example: there is a difference between the per capita incomes of
men and women in South Africa
– 例子:南非男性和女性之间在人均收入上存在差异
• Null hypothesis is always a statement that “there is no difference”
or “no impact” between our variables
• 零假设永远是变量之间“不存在差异”或“无影响”的陈述
– Example: there is no difference between the per capita incomes
of mean and women in South Africa
– 例子:南非男性和女性之间在人均收入上不存在差异
54
Testing for Significant Difference
显著性差异检验
• What is the probability of getting a R4,600 difference if we assume
there is no difference in the population from which this sample was
drawn?
• 如果我们假设抽样人群不存在任何差异,得出域值4600差数的概率是
多少?
• If the probability is “small” for getting a R4,650 difference, then we
reject the null hypothesis
• 如果得出域值4650差数的概率“小”,则我们放弃零假设。
– Small is defined at less than .05
– 小被定义为少于.05
• If the probability is”high” then reject the null hypothesis
• 如果概率“高”,则放弃零假设。
55
Remember:
记住
• A significant test is nothing more than an estimate of the
probability of getting the results by chance if there really
is no difference in the population
• 显著性检验无非是如果人群之中实际上不存在差异,对偶
然得出结果的概率进行的估计。
56
Linking?
联系?
• Should qualitative and quantitative data and associated
methods be linked during study design?
• 在研究设计中是否应该将定性和定量数据以及相关方法联
系起来?
– How?
– 如何?
– Why?
– 为什么?
57
Qualitative-Quantitative Linkages
定性-定量联系
•
•
•
•
•
•
•
•
Confirmation or corroboration – triangulation
证实或确证——三元法
Richer detail
更丰富的细节
Initiate new lines of thinking
开启新的思路
Expand the scope
扩大范围
58