No Slide Title

Download Report

Transcript No Slide Title

Using the Social Network Data From Add Health
James Moody
Sunbelt Social Networks Conference
February 13, 2001
New Orleans
•Introduction: What and Why
•Background to Add Health
•Levels of Network Data
•Composition & Pattern
•Networks on both sides of the equation
•Network Data structures
•Adjacency Matrices
•Adjacency Lists
•Network Data in Add Health
•In School Friendship Nominations
•In Home Friendship Nominations
•Constructing Networks
•Total Networks
•Local Networks
•Peer Groups
•Analyses Using Networks
•Networks as dependent variables
•Networks as independent variables
History of the National Longitudinal Survey of Adolescent Health*
better known as Add Health.
* a program project designed by J. Richard Udry and Peter S. Bearman, and funded by a grant HD31921 from the
National Institute of Child Health and Human Development to the Carolina Population Center, University of North
Carolina at Chapel Hill, with cooperative funding participation by the following agencies: The National Cancer
Institute; The National Institute of Alcohol Abuse and Alcoholism; the National Institute on Deafness and other
Communication Disorders; the National Institute on Drug Abuse; the National Institute of General Medical Sciences;
the National Institute of Mental Health; the Office of AIDS Research, NIH; the Office of Director, NIH; The National
Center for Health Statistics, Centers for Disease Control and Prevention, HHS; Office of Minority Health, Centers for
Disease Control and Prevention, HHS, Office of the Assistant Secretary for Planning and Evaluation, HHS; and the
National Science Foundation.
•Initially proposed as an adolescent version of the National Health
and Social Life Study (Laumann et al) known as the “Teen Sex”
study.
•Jesse Helms’ crew decided that asking teens about sexual
behavior was inappropriate, and the study had the dubious
distinction of being the only study ever explicitly outlawed.
•Fortunately, the same legislation stipulated that NIH fund a
national health survey, and from the ashes of Teen Sex, Add
Health was born.
•Funded at $24M for the first 4 years, Add Health was designed to
provide a comprehensive image of the state of adolescent health
and the behaviors that affect adolescent health.
The Add Health Design:
Adolescents in Social Context
Contextual
Contextual Data Base
Neighborhood
Community Characteristics
Health Service
School Context
In-School
Community/Neighborhood
School
Peers and Networks
Social Networks
Peer Groups
Sample Information
Genetic
High SES Black
Contextual Variables
Dyadic Relations
Family
Individual
Attributes
Attitudes
Behavior
Capacities
Health Status
Parent In-Home
In-Home Parent
Parenting
Family Data
Relations Between Family
and Adolescent
Ideal Sequence
Saturated Sample
In-School
In-School
Contextual Database
Saturation
Partners
Relational Data
Behavior Characteristics
of Peers and Peer Group
Substantive Domains covered in the Add Health Design
Individual:
•Demographics
•Detailed, multiple
race /ethnic categories
•Immigrant status
•Socio-Economic
Status
•Health Status
•Nutrition
•STD & Sexual
Behavior
•Exposure
•Emotional
•Physical
•Insurance/ Access
•Daily Activities
•Exercise
•TV/Hobbies
•Academic Exposure
•Subjects taught
•Sexual knowledge
•Future Expectations
•Risk taking activity
•Delinquency
•Drugs
•Fighting and
Violence
•Motivation
•Personality
•Religion
•Neighborhood assessments
Family:
•Detailed Household Roster
•Family Structure
•Parental Interview
•Sibling relations
•Parental behaviors
•Multiple observations in
the same family
•Parent’s knowledge of
adol:
•Activities
•Friends
•Adolescent Assessment of
parents expectations and
rule behavior
•Twin Design
Relations, Peers & Nets:
•Population sample in
schools provides complete
network images
•Constructed network data
•Friendship nomination files
•Romantic relation
characteristics
•Real and Ideal
•Relationship timing and
duration
•Information from both
sides of the relation in many
cases
•Peer assessment of peer
activity: not just respondent
assessment
Community
•GIS links for spatial
analysis
•Contextual data at the Block
Group, City, County and
State level
•Topics include:
•Population
•Vital Statistics
•Group Quarters
•Households
•Income
•Poverty
•Education
•Labor Force
•Housing
•At Risk Children
•Health Care
•STD Levels
•Crime
•Religion
•Elections
•Social Welfare
•Gov’t Expenditure
•Abortion Access
•Tobacco
•Health Policy
Sampling Structure for Add Health
School Sampling Frame = QED
HS
HS
HS
HS
HS
Feeder
Feeder
Feeder
Feeder
Feeder
Sampling Frame of Adolescents and Parents N = 100,000+ (100 to 4,000 per pair of schools)
High Educ
Black
Disabled Sample
Puerto Rican
Saturation
Samples
from 16 Schools
Chinese
Genetic
Samples
Identical Twins
Ethnic
Samples
Fraternal Twins
Main Sample 200/Community
Cuban
Cuban
Full Sibs
Half Sibs
Unrelated Pairs
in Same HH
Add Health School Sampling Strategy
The National Longitudinal Study of Adolescent Health: Demographic Sub-Sample Sizes
Core Sample
12,104
White: 8,467
Male: 4,075
Two
Parents:
3150
One
Parent
760
Black: 2,384
Female: 4,392
Male: 1,092
Hispanic: 1,456
Female: 1,292
Two
Parents
3325
One
Parent
842
Two
Parents
496
One
Parent
466
Two
Parents
569
One
Parent
593
Male: 708
Two
Parents
477
One
Parent
184
Female: 748
Two
Parents
505
One
Parent
189
7th: 484
108
552
126
92
75
88
98
62
28
76
26
8th: 522
115
526
137
93
93
93
102
82
18
78
33
9th: 543
135
574
156
73
84
95
97
79
34
78
38
10th: 536
141
540
151
80
80
99
99
94
37
92
23
11th: 581
132
551
125
79
64
91
96
87
29
89
31
12th: 445
108
524
117
67
50
87
84
58
30
80
30
Deductive Disclosure Risks:
Start with: 536
White, Male, 10th Graders in Two parent Households:
Who are Jewish:
10
And Have No Siblings:
1
Start with: 484
White, Male, 7th Graders in Two parent Households:
Who Have Ever Been Held Back A Grade in School:
87
And Play Basketball:
5
And Smoke:
1
Deductive Disclosure Risks:
Start with: 87
Black, Female, 12th Graders in Two parent Households:
Who have Never been Held Back:
77
And Smoke Regularly:
5
And Have 2 siblings
1
And are Catholic
1
Deductive Disclosure Risks:
Start with: 98
Black, Female, 7th Graders in One parent Households:
Who Are Baptist:
41
And have no Siblings:
9
And Play Baskettball:
1
And have one Sibling:
13
And Smoke:
1
And have > one Sibling:
19
And are Born in April:
1
Levels of Network Data
ego
Best Friends
ego
ego
Local Network
Peer Group
Measuring Network Context
Patterns
Pattern measures capture some feature of the distribution of relations
across nodes in the network. These include:
•Density: % of all possible ties actually made
•Reciprocity: likelihood that given a tie from i to j there will also be a tie
from j to i.
•Transitivity: extent to which friends of friends are also friends
•Hierarchy: Is there a status order to nominations? How is it patterned?
•Clustering: Are there significant groups? How so?
•Segregation: Do attributes (such as race) and nominations correspond?
•Distance: How many steps separate the average pair of persons in the
school? Is this larger or smaller than expected?
•Block models: What is the implied role structure underlying patterns of
relations?
These features (usually) require having nomination data from each person in the
network.
Measuring Network Context
Composition
Composition measures capture characteristics of the population of
people within a given network level. These include:
•Heterogeneity: How dispersed are actors with respect to a given attribute?
•Means: What is the mean GPA of ego’s friends? How likely is it that most
of ego’s friends will go to college?
•Dispersion: What is the age-range of people ego hangs out with?
These features can often be measured from the simple ego network.
Analysis with Social Network data
Networks as Dependent Variables
•Interest is in explaining the observed patterns of relations.
•Examples:
•Why are some schools segregated and others not?
•What accounts for differences in hierarchy across schools?
•What accounts for homophily in friendship choice?
•Tools:
•Descriptive tools to capture properties
•Standard analysis tools at the level of networks to explain the
measures
•p* and other specialized network statistical and simulation
models
Analysis with Social Network data
Networks as independent Variables
•Interest is in explaining behavior with network context (Peer
influence/ context models)
•Examples:
•Is ego’s probability of smoking related to the smoking levels of
those he/she hangs out with? (compositional context)
•Is the transition to first intercourse affected by the peer context?
•Are isolated students more likely to carry weapons to school
than those in dense peer groups? (positional context)
•Tools:
•Depends on dependent variable
•Peer influence models
•Dyad models
•Contextual models, with network level as nested context
(students within peer groups)
Network Data Structures
Adjacency Matrix
Graph
1
2
3
5
4
Arc List
Send Recv
1
2
1
3
2
4
3
2
4
1
4
2
4
3
4
5
5
1
5
3
5
4
Node List
Network Analysis Programs
1) UCI-NET
•General Network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a little chunky, but workable.
•Not optimal for large networks
•Available from:
Analytic Technologies
[email protected]
2) STRUCTURE
•“A General Purpose Network Analysis Program providing Sociometric
Indices, Cliques, Structural and Role Equivalence, Density Tables, Contagion,
Autonomy, Power and Equilibria In Multiple Network Systems.”
•DOS Interface w. somewhat awkward syntax
•Great for role and structural equivalence models
•Manual is a very nice, substantive, introduction to network methods
•Available from a link at the INSNA web site:
http://www.heinz.cmu.edu/project/INSNA/soft_inf.html
Network Analysis Programs
3) NEGOPY
•Program designed to identify cohesive sub-groups in a network, based on
the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm
4) PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for all of the real data plots in this presentation
•Mainly a graphics program, but is expanding the analytic capabilities
•Free
•Available from:
Network Analysis Programs
5) Cyram Netminer for Windows: A new exploratory tool for networks
6) SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)
•is a collection of IML and Macro programs that allow one to:
a) create network data structures from the Add Health nominations
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
•All of the Add Health data are already in SAS
•Available by sending an email to:
[email protected]
Network Data Collected in Add Health
In -School Network Data
•Complete Network Data collected in every school
•Each student was asked to name up to 5 male and 5 female friends
•These data provide the basic information needed to construct
network context measures.
•Due to response rates, we computed data on 129 of the 144 total
schools.
•Variable is named MF<#>AID form male friend, FF<#>AID for
female friends.
Slide here of the survey instrument
Network Data Collected in Add Health
In -School Network Data
Nomination Categories:
•Matchable people inside ego’s school or sister school
•People who were present that day
ID starting with 9 and are in the sample
•People who were absent that day
ID starting with 9, but not in the school sample
•People in ego’s school, but not on the directory
Nomination appears as 99999999
•People in ego’s sister school, but not on the director
Nomination appears as 88888888
•People not in ego’s school or the sister school
Nomination appears as 77777777
•Other special codes
•Nominations appears as 99959995
Nominator Categories
•Matchable nominator
Person who was on the roster, ID starts is 9.
•Unmatchable nominator
Person who was NOT on the roster, ID starts with 5 or 8
Network Data Collected in Add Health
In -School Network Data
Example 1. Ego is a matchable person in the School
Out
Un
Ego
M
M
True Network
Out
Un
Ego
M
M
Out
Un
M
M
M
M
Observed Network
Network Data Collected in Add Health
In -School Network Data
Example 2. Ego is not on the school roster
M
M
Un
Un
M
M
M
M
M
M
M
Un
Un
Un
True Network
M
Observed Network
Network Data Collected in Add Health
In -School Network Data
Characteristics of the Add Health School Sample
Sample Characteristics
Number of schools
Number of students
School Type
Public
Private
Grade Range
Junior High School
High School
7 - 12
Region***
West
Midwest
South
North East
Demographic Characteristics
% of schools >70% single race
Family SES
Behavioral Characteristics
Smoke Regularly
Sexually active
Expect to go to College
Active in school activities
*p<.05, ** p<=.01, ***p<=.001.
All Schools
Schools w. network data
144
90,118
129
75,871
89.6%
10.4
89.9%
10.1
40.6%
43.4
16.1
40.3%
43.4
16.3
19.4%
22.9
40.9
16.7
15.5%
24.0
42.6
17.8
52.7%
6.03
55%
6.02
14.4%
32.3
76.2
14.7%
32.9
76.3
Network Data Collected in Add Health
In -School Network Data
Local - Network Characteristics (Std. Dev. in parentheses)
Total
5.68
(3.45)
Same Sex
Male
Female
3.08
3.57
(1.98)
(1.74)
Out-of-school nominations
1.04
(1.87)
0.42
(0.98)
0.45
(0.93)
0.42
(1.09)
0.78
(1.28)
Local network densitya
0.18
(0.19)
0.22
(0.24)
.26
(.26)
.19
(.25)
.15
(.23)
Reciprocity rateb
0.40
(0.30)
0.40
(0.35)
0.51
(0.34)
0.29
(0.35)
0.27
(0.34)
7th - 8th grade
0.36
(0.29)
0.38
(0.35)
0.46
(0.33)
0.23
(0.33)
0.20
(0.30)
9th - 10th grade
0.38
(0.30)
0.39
(0.35)
0.52
(0.34)
0.25
(0.33)
0.26
(0.34)
0.45
0.43
0.56
(0.31)
(0.36)
(0.34)
a) Includes nominations to people not sampled
b) Proportion of ego's nominations that are reciprocated
0.37
(0.37)
0.32
(0.36)
a
In-school nominations
11th - 12th grade
Cross Sex
Male: Female Female: Male
2.19
2.54
(2.08)
(1.95)
Network Data Collected in Add Health
In -Home Network Data
•Network Data were collected in both Wave1 and Wave 2 Surveys
•There were two procedures:
•Saturated Settings
•Attempted to survey every student from the In-School sample.
•2 large schools, and 10 small schools.
•Was supposed to replicate the in-school design exactly.
•Unsaturated Settings
•Each person was only asked to name one other person
•In both cases, the design was not always carried out. As such, some of
the students in the saturated settings were allowed to name only one
male and one female friend, while some students who were in the nonsaturated settings were asked to nominate a full slate of 5 and 5.
Network Data Collected in Add Health
In -Home Network Data
Data Usage Notes:
•Romantic Relation Overlap
For the W1 and W2 friendship data, any friendship that was also a romantic relation was
recoded to 55555555, to protect the romantic relation nominations.
•Bad Machine on Wave 2 Data
Data on from one school in wave 2 seems to be corrupted. We have no way to show this for
certain, but it seems to be the case that data from machines 200065 or 200106 gave incorrect
data. We suspect this is so, because almost everyone who used these two machines
“nominated” the same person multiple times. This results in one person having an abnormally
large in-degree.
•All nomination #s are now valid
•Unlike the in-school data, Ids starting with something other than ‘9’ can be nominated.
•Same out-of-sample special codes
•All other special codes for these data are the same as in the in-school data.
Network Data Collected in Add Health
In -Home Network Data
Descriptive Statistics for Saturated Settings
Constructing Network Measures
Total Network
To construct the social network from the nomination data, we need to integrate each person’s
nominations with every other nomination.
Methods:
1) Export the Nomination data to construct network in other program
MOST of the other programs require you to pre-process the data a great deal
before they can read them. As such, it is usually easier to create the files in SAS first, then
bring them into UCINET or some such program.
2) Construct the network in SAS
The best way to do this is to combine IML and the MACRO language. SAS IML
lets you work with matrices in a (fairly) strait forward language, the SAS MACRO language
makes it easy to work with all of the schools at once.
Programs already set up to do this are available in SPAN.
Constructing Network Measures
Adjacency Matrices
The key to analyzing / measuring the total network is constructing either an adjacency matrix
or an adjacency list. These data structures allow you to directly identify both the people ego
nominates and the people that nominate ego. Thus, the first step in any network analysis will
be to construct the adjacency matrix.
To do this you need to:
1) Identify the universe of possible people in the network. This is usually the same as the
set of people that you have sampled. However, if you want to include ties to non-sampled
people you may make the universe include all people named by anyone.
2) create a blank matrix with n rows and n columns.
3) loop over all respondents, placing a value in the column that corresponds to the persons they
nominate. This can be binary (named or not) or valued (number of activities they do with
alter).
Constructing Network Measures
Local Networks.
•To create and calculate measures based only on the people ego nominates,
you can work directly from the nomination list (don’t need to construct the
adjacency matrix).
•To create and calculate measures based on the received or reciprocated
ties, you need to have a list of people who nominate ego, which is easiest to
get given the adjacency matrix.
•To calculate positional measures (density, reciprocity, etc.) all you need is
the nomination data.
•To calculate compositional data, you need both the nomination data and
matching attribute data.
Constructing Network Measures
Peer Groups.
Identifying cohesive peer groups requires first specifying what a cohesive peer group is.
Potential definitions could be:
a) all people within k steps of ego (extended ego-network)
b) a set of people who interact with each other often (relative density)
c) a set of people with a particular pattern of ties (a closed loop, for example)
UCINET, STRUCTURE, NEGOPY and SPAN all provide methods for identifying
cohesive groups. They all differ on the underlying definition of what constitutes a group.
The FACTIONS algorithm in UCINET and NEGOPY’s algorithm use relative density. The
CROWD algorithm is SPAN uses a combination of relative density and pattern.
Once you have constructed the adjacency matrix, you can export to these other programs
fairly easily. However, most of them are QUITE time consuming (FACTIONS, for
example, is a bear) and take a good deal of time to run, so be sure you have identified
exactly what you want before you start processing….
Constructing Network Measures
Peer Groups Characteristics.
Identifying Cohesive Sub-Groups
• Cohesion: The group is difficult to separate; the connection
of the group does not depend on one relation or person.
• Groupness: Relative to the rest of the network, a cohesive
sub - group has high relational volume.
• Inclusion: Some people are not in groups while others
bridge groups.
Examples of Peer groups within Add Health High Schools
Crowds Algorithm
Observed Clustering within Adolescent Social Networks
Network Characteristics of Sub Groups
• On average, 65% of a school’s adolescents are in
cohesive sub-groups.
• 87% of all relations are within sub-groups.
• The average sub-group has 22 members.
• The average diameter for a sub-group is 3 steps.
• The mean segregation index is .96 (1=Complete,
0=Random)
Observed Clustering within Adolescent Social Networks
Distribution of Characteristic within groups, relative to school distribution
34%
65%
84%
86%
79%
74%
Grade
Race
College
GPA
Activities
Smoking
Constructing Network Data
School Level
Groups 23 & 24
Group 1
Group 15
Group 18
Constructing Network Data
School Level
Inter-Group Relations
Mostly Seniors
Mostly Juniors
4
1
17
Mostly Sophomores
30
7
27
3
Mostly Freshmen
25
Mixed Grades
12
16
15
Directed Arrow
23
24
19
13
14
31
10
18
21
5
20
2
Analysis Using Network Data
Nets as Dependent Variable: Racial Segregation
Same race friendship preference
Same Race Friendship Preference (b1)
by racial heterogeneity
1.6
Countryside h.s.
1.0
.4
-.2
.1
.3
.6
Racial Heterogeneity
.8
Analysis Using Network Data
Nets as Dependent Variable: Modeling the network
Network Model Coefficients, In school Networks
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
Analysis Using Network Data
Nets as Independent Variable: Suicide
Relational Structures and Forms of Suicide
Regulation
Low
High
High
Anomic
Altruistic
Integration
Low
Egotistic
Fatalistic
Analysis Using Network Data
Nets as Independent Variable: Suicide
Measuring Isolation and Anomie.
Isolation
Peer Anomie
Alter
Ego
School
(
)
Intransitivity
Third
Analysis Using Network Data
Nets as Independent Variable: Suicide
Effect of Friendship Structure on Suicidal Thoughts
Net of demographic, family, school, religion and personal characteristcs.
Males
Females
OR
95% CI
OR
95% CI
Network
Isolation
0.665 (0.307 - 1.445)
2.010 (1.073 - 3.765)
Intransitivity Index
0.747 (0.358 - 1.558)
2.198 (1.221 - 3.956)
Friend Attempted Suicide
2.725 (2.187 - 3.395)
2.374 (2.019 - 2.791)
Trouble with People
0.999 (0.912 - 1.095)
1.027 (0.953 - 1.106)
Analysis Using Network Data
Nets as Independent Variable: Weapons
Probability of Carrying a Weapon by Race and Gender
0.14
Probability of carrying a weapon
0.12
0.1
0.08
Males
Females
0.06
0.04
0.02
0
White
Black
Hispanic
Asian
Native American
Race/Ethnicity
a) Figure represents predicted probabilities model 6 of table 5, holding all other variables at the full sample mean.
Other
Analysis Using Network Data
Nets as Independent Variable: Weapons
Network Effects on Weapon Carrying
0.18
Probability of carrying a weapon to school
0.16
Peer Group Deviance
0.14
0.12
0.1
Social Outsiders
0.08
0.06
School Oriented Peer Group
0.04
0.02
0
Positive:
0.08
0.19
0.3
0.41
0.52
0.63
0.74
0.85
Negative:
0
1
2
3
4
5
6
7
character of peer context
Analysis Using Network Data
Nets as Independent Variable: Sexual Debut
The Effect of Peer Group Composition on Sexual Debut*
0.40
Estimated Probability of Sexual Debut
0.35
0.30
0.25
0.20
0.15
0.10
0.05
N=380
N=1898
N=2026
N=660
N=88
0.00
0%
1-25 %
26-50%
51-75%
Proportion of High-Risk Adolescents in Peer Group
76-100%
*Probability of experiencing sexual debut during the 18 months following the in-school survey.
Controlling for age, socio-demographic characteristics, family and peer group characteristics (see table
A1, model 6). Bearman and Bruckner, 1999
Analysis Using Network Data
Nets as Independent Variable: Pregnancy
The Effect of Close Friends' Risk Status on Pregnancy Risk*
Estimated Probability of Pregnancy
0.20
0.15
0.10
0.05
N=308
N=932
N=100
N=517
N=550
N=427
0.00
no friends
0%
1-25 %
26-50%
51-75%
76-100 %
Proportion of Low-Risk Male and Female Close Friends
*Probability of experiencing a pregnancy during the 18 months following the in-school survey.
Controlling for age, socio-demographic and individual characteristics, family characteristics, and
popularity (see table B1, model 3), Bearman and Brukner 1999.
Wave III Respondents
Wave II participants
• Main sample plus special samples
• Aged 18-25
Partners or original participants
• 2,000 couples
Add Health Wave III:
The Transition to Adulthood
• How is what happens in adolescence
related to what happens in young
adulthood?
• The influence of adolescent contexts
on young adult outcomes
Additional Content of Wave III
AHPVT
Social security number
Longitude and latitude
College context
Physical measurements
Biomarkers
Network Transitions
Special Features
CASI event history calendar
Preloaded data from Waves I and II
Re-interviews with STI-positive individuals
Binge-drinking sample
High school transcript data
Wave III Questionnaire Content
Family relationships
Relationships
Friends
Pregnancies and births
Education
Delinquency and violence
Work experience
Involvement with criminal justice system
General health
Tobacco, alcohol, drugs, suicide
Mental health
Mentoring
Illnesses, disabilities
Civic participation
Marriage/cohabitation
Religion and spirituality
Sexual experiences and STDs
Gambling