Transcript Rob_Procter

Text Mining for e-Social Science
Rob Procter
National Centre for e-Social Science
[email protected]
http://www.ncess.ac.uk
21st March, 2005
NaCTeM
1
Overview
 National
Centre for e-Social
Science
 Why do we need e-Social
Science?
 Applications of text mining
21st March, 2005
NaCTeM
2
NCeSS goals

Explore opportunities for take up of eResearch within the social sciences:
– Substantive social science research problems
– Enhance existing research methods
– Facilitate new research methods

Social shaping of e-Research:
– Socio-technical factors in the design, uptake and use of eResearch infrastructures and tools
– Implications for research practice and knowledge production
– Policy and socio-economic impacts
21st March, 2005
NaCTeM
3
NCeSS structure
 Co-ordinating
hub at
21st March, 2005
NaCTeM
Manchester
 Four research ‘nodes’ already
commissioned, more to be
announced later this year
 ‘Small’ research projects
4
Why do we need e-Social Science?



As social scientists become concerned with
addressing complex, multi-faceted research
problems, they require the creation and use
of increasingly multi-level, multi-textured
data resources
e-Social Science offers improved methods
and tools for data description, discovery and
curation
e-Social Science offers new ways for the
study of complex social phenomena
21st March, 2005
NaCTeM
5
Text mining for e-Social Science

e-Social Science is often associated
with quantitative research methods:
– Statistical analysis, modelling and
simulation

e-Social Science applications in
qualitative research have received less
attention:
– New tools are needed to address
challenges of analysing increasing volumes
of qualitative data
21st March, 2005
NaCTeM
6
Applications of text mining I




Content analysis of textual data is an
established method in qualitative social
science research
There are many variations on but its essence
involves searches for keywords, their cooccurrences and interpretation in context
Simple computer-based tools (QDAS) have
been used for some time to support content
analysis
New text mining techniques promise
improvement in power and sophistication
21st March, 2005
NaCTeM
7
Content analysis example




Systematic reviews of the scientific literature are
an important tool for researchers and policy makers
Finding relevant studies calls for the search of very
large literature databases
Records are not well coded, so this usually results in
a very large number of titles and abstracts (often >
10,000) being retrieved which need to be sorted
manually
Text mining techniques such auto-summarising,
annotating etc. can help to make systematic reviews
easier to conduct and improve their quality
21st March, 2005
NaCTeM
8
Applications of text mining II



The solution of complex, multifaceted social science problems calls
for new ways of doing research
Quantitative and qualitative
research methods reflect a
fundamental divide in social sciences
Researchers could benefit from
tools which would enable them to
combine these different
methodological traditions
21st March, 2005
NaCTeM
9
‘Mixed methods’ example



FINGrid is a demonstrator for analysing financial
information in form of quantitative data (time series)
and qualitative data (financial/political news)
FINGrid works by text mining financial news
(Reuters news feed) for ‘market sentiment’ and then
attempts to correlate this data with time series data
of market price movements
The ultimate aim is to understand better the
relationship between price movements and market
sentiment
Prof K Ahmad, University of Surrey
21st March, 2005
NaCTeM
10
21st March, 2005
NaCTeM
11
Applications of text mining III

Data about social phenomena is being
generated on increasing scale as by-product
of the everyday activities of social actors:
– Bulletin boards, weblogs, news feeds, etc.


These resources could provide researchers a
much richer picture of social phenomena than
is available via more conventional data
gathering techniques
But researchers lack adequate tools to
exploit them
21st March, 2005
NaCTeM
12
A Virtual Observatory for the Social Sciences?

Bulletin boards, weblogs, news feeds, etc. raise
different requirements for description and discovery
from conventional social science datasets:
– They are dynamic, their content growing and their relevance
for researchers changing continuously

Making such data sources useful for research would
need text mining tools for:
–
–
–
–

Filtering
Annotating
Summarising
Information extraction such as sentiment analysis
Using text mining in these ways, we could create a
‘Virtual Observatory for the Social Sciences’
21st March, 2005
NaCTeM
13
Getting involved in NCeSS
Small projects programme is open
until July 31st
 First International Conference on
e-Social Science 22nd-24th June,
Manchester

http://www.ncess.ac.uk/events/conference/index.shtml
21st March, 2005
NaCTeM
14