SADA - digital libraries laboratory @ uct . cs

Download Report

Transcript SADA - digital libraries laboratory @ uct . cs

Data Intergration at the
South African Data Archive (SADA)
NRF
Presented by Dr Daisy Selematsela
1st African Digital Curation Conference 12 – 13 Feb
2008
CSIR Pretoria
Outline
• What is SADA
• Stakeholders
• Products and services
• Challenges
• Value adding of data sets
2
What is SADA?
• Established in 1993
• Oriented towards Social Sciences
• Brokerage service between data providers
• Provide central repository for quantitative data
3
4
SADA Stakeholders
•
•
•
•
•
Statistics South Africa
Human Science Research Council (HSRC)
IDASA
South African Police Services
International Association of Social Science
Information Service & Technology (IASSIST) –
Europe
• Inter-University Consortium for Social & Political
Research (Univ. of Michigan)
5
Who & where are SADA clients?
•
North America (432)
•
Canada (10)
•
Brazil (3)
•
Europe (206)
•
Australia (36)
•
New Zealand (4)
•
Mauritius (2)
•
Asia (61)
•
Africa (West & East) (26)
•
South Africa (400)
6
Products & Services: Datasets
•
•
•
•
•
•
•
•
•
•
•
Census surveys
General Household surveys
Demographic studies
Health studies
Substance abuse and crime
Income and poverty
Inter-group relations
Labour (workforce survey)
Political perceptions and attitudes
Education & training
Omnibus and international studies
7
8
Services: Brokerage
• ICPSR
• Directory of Data
Producers in SA
– Membership based org.
– Produced by NRF
– National subscription
– Aimed at organisations that
produce scientific data
– Access to huge collection
of Social Science data
– One entry point
– Benefit SA researchers,
institutions, postgrads etc
– Online entry form for
interested parties
9
10
Challenges
• Data intergration
– Def: ways in which information from a variety of
sources (census, survey, transactions, administrative
systems) usually held in different Db can be combined
to create powerful new resource to address major
research issues!
– Constraints:
• Lack of knowledge about the scope of integration
• Lack of skills to facilitate linking and the awareness that, as
data become more extensive,
– The possibility of inadvertent disclosure of the identity of
individuals/organisations increases
11
Challenges
• Data stewardship/management
– Errors and biases in data representation
– Lack of interoperability
• Data discovery
– Loss of scientific and transformative power due to
lack of knowledge about existing data opportunities
– Inadequate tools to find existing data
– Cost of documenting and storing data
12
Challenges
– Promoting best practice in Data sharing (OECD
Guidelines 2007) assumption that publicly funded
research data “are a public good, produced in the
public interest”
– ‘Data re-use’ or ‘secondary data analysis’
• Legal impediments to use data for purposes other than those
for which their collection was originally authorised;
• Ethical considerations – need to inform data subjects & seek
permission to reuse data
• Scientific culture associated with “first and privileged use” of
data collected for a specific purpose
13
Helsinki School of Economics study
• “Sensation Seeking , Overconfidence & Trading Activity”
accepted by The Journal of Finance, Mark Grinblatt
(Univ. of California) & Matti Keloharju (Helsinki School of
Economics)
• Version available at
http://www.anderson.ucla.edu/documents/areas/fac/finan
ce/06-06.pdf
• Source: International Herald Tribune, Saturday-Sunday,
February 9-10,2008: Speeding and trading: It’s the same
heady rush (page 15)
14
Speeding study
If you get speeding tickets,
watch out: the chances are
good that you will also
engage in possibly dangerous
investing behaviour, too.
15
Study Data use!
• Mark & Matti were able to find a correlation between
speeding tickets & trading frequency after they received
access to several data sets from Finnish government
• Databases contained details:
– of speeding tickets issued between mid-1997 – 2001 of Helsinki
residents
– Portfolios and trading records of all Finish households from 1995
– 2002
– Filing of tax returns
–
• Outcome: these rich data sets enabled the researchers
to bracket other possible causes of trading activity and
focus on the distinct influence of speeding tickets alone!
16