Department of Information Studies Research Methods Day

Download Report

Transcript Department of Information Studies Research Methods Day

Department of Information Studies
Research Methods Day
Working with numbers
Oliver Duke-Williams
[email protected]
Working with numbers
•
•
•
•
Scientific method
Sampling
Statistical measures
Where to go next?
Theories, hypotheses and laws
• A hypothesis is an (educated) guess about
phenomena, based on observation and prior
knowledge
• A theory summarises a hypothesis or set of
hypotheses which have good experimental
support and seeks to explain observations
• A law is an inviolable description of observations
Falsifiability
• Approach popularised by Karl Popper
– Popper, K (1963) Conjectures and Refutations: The
growth of scientific knowledge, Routledge
• A statement is falsifiable if a reproducible
experiment can show it to be wrong
– “Water boils at 100°C”
• Scientific theories are never ‘proven to be correct’,
they have simply not yet been proven to be wrong
Constructing hypotheses
• In using mathematical methods to conduct
research it is therefore necessary to construct
falsifiable hypotheses that can be tested
• We generally start with a null hypothesis, that
there is no difference between things we are
interested in
A null hypothesis
• “The launch of a website will not make a
difference to the number of visitors to a museum”
– Identify a set of museums with pre- and post-website
visitor data
– Compare the two sets of visitor data: is there a
statistically significant difference?
– Yes
• We can reject the null hypothesis
– No
• We have failed to reject the null hypothesis
An alternative hypothesis
• Having rejected a null hypothesis, we can then
test an alternative hypothesis
– “The launch of a website will increase the number of
visitors to a museum”
Sampling
• How many museums would you need to observe?
– This is quite tricky, as we do not consider all museums
to be essentially the same
• Sampling is easier to understand with human
subjects
– “Men are taller than women”
– How many men and women should we measure?
Sampling approaches
•
•
•
•
Census
Random sampling
Systematic sampling
Stratified sampling
Statistical measures
• Summary statistics
– Means, medians, modes
– Variance and standard deviation
• Correlation
– Is there dependence between two variables?
• Student’s t-test
– Is the mean of two sets of observations different?
Statistical software
• SPSS, SAS, STATA
• Many tests can be done using Excel
– But there are known weaknesses
Correlation
• If there is a (strong) relationship between two
variables, they are correlated
• Consider the relationship between physical visits
to a museum, and website visits
Correlation
Institution
British Museum
Geffrye Museum
Horniman Museum
Imperial War Museum
Total Physical Visits
2010-2011
Total Unique Web Visits 20102011
5,869,396
104,691
584,974
2,317,639
21,496,815
527,082
252,867
8,587,082
638,347
330,000
National Gallery
5,084,929
4,500,000
National Maritime Museum
2,450,155
10,052,347
National Museums Liverpool
2,635,993
3,176,266
National Museum of Science and Industry
4,093,463
15,020,206
National Portrait Gallery
Natural History Museum
Royal Armouries
Sir John Soane's Museum
Tate Gallery
Tyne and Wear Museums Service
1,758,488
4,812,197
462,753
109,604
7,450,000
2,018,233
13,724,626
7,397,821
403,379
365,099
19,427,000
1,006,250
Victoria and Albert Museum
3,049,000
24,976,400
357,538
43,797,400
305,609
131,548,849
Museum of Science and Industry in Manchester
Wallace Collection
Total Visits
See: http://melissaterras.blogspot.co.uk/2012/03/physical-versus-website-visitors-to.html
0.699646681
Is this significant?
30,000,000
Unique web visits 2010-11
25,000,000
20,000,000
15,000,000
10,000,000
5,000,000
0
0
1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000
Physical visits 2010-11
Is this significant?
• The correlation coefficient varies between -1 and
+1
• At 0 there is no correlation at all
• At -1 there is a perfect (negative) correlation
• At +1 there is a perfect positive correlation
Is this significant?
• We normally test for significance at the 5% level
– There is a probability of 0.05 that the relationship we
have observed might have occurred due to chance
• Tests can be one-tailed or two-tailed; a two-tailed
test means that difference might be either positive
or negative
• We need to compare our calculated coefficient to
a table of critical values
• e.g www.sussex.ac.uk/Users/grahamh/RM1web/Pearsonstable.pdf
Testing significance
• Coefficient=0.699
• Degrees of freedom
= pairs-ofobservations – 2
= 17 -2
= 15
.482
Student’s t-test
• The student’s t-test tells us whether the means of
two sets of data are (significantly) different
• As with correlations, a value (‘t’) is calculated, and
then compared to critical values
• Excel’s TTEST function returns the P-value (the
probability) directly
Is the geography of library-going the same as
the geography of museum-going?
• Data from:
– http://data.london.gov.uk/datafiles/art-culture/librarymuseum-art-participation-borough.xls
– Includes
• % (in sample) used a library in past 12 months
• % (in sample) visited a museum or gallery in past 12 months
0.000129372
Where to go next?
• UCL Graduate School
– “Basic statistics for research”
– http://courses.grad.ucl.ac.uk/coursedetails.pht?course_ID=1813