And Big Awkward Data
Download
Report
Transcript And Big Awkward Data
And Big Awkward Data
Madeleine Thornton: [email protected]
Where we started 2 ½ years ago
Me:
• Background in social research
• Little bit of experience with public / open data
• Zero experience with administrative data
• Handy with a spreadsheet
Buttle UK:
• General idea that we are sitting on valuable
data but
• Little idea how to analyse and use it.
Reports from our
database
Geographic data
Started with the basics; a list of the number of
grants made in every LA.
Geographic data
Wanted a more nuanced understanding of
geographic spread of grants.
• Are they going to areas of greatest need?
• Are there places we are not reaching?
• Turn to the Indices of Multiple Deprivation;
the National Statistics Postcode Lookup and
Google maps (plus trusty Excel).
Getting there. We know reaching areas of high
need.
Percentage of Children in Need Grants
awarded
50.00%
Percentage of Children in Need Grants awarded in 2011-12 and
2012-13 by level of area deprivation
45.00%
40.00%
35.00%
30.00%
25.00%
20.00%
2011-12
15.00%
2012-13
10.00%
5.00%
0.00%
1
2
3
4
5
6
7
8
9
10
UK areas divided in to 10 equal sized groups according to level of deprivation. Number 1
represents the 10% most deprived areas in the UK. Number 10 represents the least deprived
10% of areas in the UK
But still don’t know where we’re missing…
And now we can see where we want to target our
outreach
Geographic data
Steps were:
• Download data from our database with a
postcode for every grant.
• Match postcodes to Datazones using the National
Statistics postcode lookup available from ONS
• Match Datazones to rank in Index of Multiple
Deprivation (available from Scottish Government)
• Map it using Google Fusion tables.
Client experiences
Each case is coded with up to four categories
indicating “reasons” for giving a grant. Knew
what they were and which were top of the list,
but how do they interact?
• Started messing around in Excel
• Reached the end of my skills
• Turned to Datakind for help!
Quick detour to say THANK YOU to
Datakind UK & volunteers
Started here
Datakind UK got us here
Datakind UK got us here
The hardest one. Text data
Like many charities, vast amounts of our data is
in text format.
____ with her two sons fled her abusive husband
and came to our refuge in November. She has
engaged with her support plan here very well
became more independant. She is a wonderful
mother. Her boys are doing very well. Yesterday
she has been offered a property by our council
and will be moving to it on the 9th ____. When
she left her husband she had to leave everything
behind. She is still awaiting for a decission
for ____ benefit to be paid to her and not to
her ex husband. So her finances are limited.
There’s only so far you can get with traditional
qualitative analysis when you have hundreds of
thousands of these…
Got started with Datakind: Keyword counts
cross-tabulated with financial info
We had no idea this relationship existed… What
else is there to know?
Now working on it ourselves
• What are we looking for?
• Not sure yet, just exploring!
• Tools I am using (or trying out):
–Python
–Lots and lots of online tutorials and
forums
A glimpse (still early days)
This is me trying out Python’s Natural Language Processing
Toolkit (NLTK) to explore some qualitative text survey data I
have. We ask grant recipients: “Can you describe one change in
your family life since getting your grant from Buttle UK?”
What words commonly appear
together?
I type:
Python tells me:
In what context are people using the
word “stress?”
I type:
Python tells me:
What next?
• So many more places to go with
Natural Language Processing.
• Want to explore other types of data
mining.
• But also need to focus on producing
usable tools (like with our geographic
data).
What for in future?
• Fundraising: We support a really broad range of families but
many funders have more niche interests. Can explore our data
quickly and systematically to see if we are likely to be a good
fit by looking for certain problems/experiences etc.
• Targeting: What if we could
work out what people might
need before they even know
to ask for it?