Political Innovation Social Media Mining

Download Report

Transcript Political Innovation Social Media Mining

SOCIAL MEDIA MINING WHAT IS IT GOOD FOR?
AND
WHEN IS IT GOOD ENOUGH?
Nick Buckley
SoShall Consulting
asc
Funky Data
25th September 2012
1
The Plan
• What is Social Media Mining? [SMM]
• How do Market Researchers tend to think about it?
• Nuts & Bolts – practical outcomes
• Challenges and Constraints
• [How] Do these make Researchers re-think the ‘place’ of SMM
• Where will it go from here?
BUT:
•
Assumption of a vendor  researcher distinction – even if in house
•
No naming or comparing of vendors/applications
•
Difficult to judge where to pitch the basics – too familiar vs. too abstract
2
1. What are we talking about?
3
What exactly are we talking about?
Definition* of social media monitoring:
“Social Media Monitoring (SMM) means the identification, observation, and analysis of
user-generated social media content for the purpose of market research.”
What they say
* http://www.social-media-monitoring.org
4
What are we talking about? Social Media...
Blogs/
Microblogs
Client sites
Public
Communities
Review sites
Professional & Consumer
News sites
Video sites
Forums
5
What’s in a word?
GfK NOP currently prefers “Mining”. User generated content in social media lays down a
rich seam of activity, opinion, thought and information… mess, echoes and ‘whimsy’.
For some time marketing and PR professionals have been
monitoring Social Media to capture headline ‘buzz’ in real
time, and to detect sudden changes requiring a response.
But collecting and counting this content is only the
beginning of a process which can add value via many
techniques… including integration with other sources
such as market research data.
6
2. What happens when Market Researchers get
hold of it?
7
Sony brand damage was driven by PlayStation breach
(2011)
sony buzz this year
sony sentiment this year
sony buzz in april
sony sentiment in april
playstation buzz
playstation sentiment
8
Market Researchers believe that SMM can also give clients a
window on other dimensions of online conversations
SMM provides insights into:
• Category Dynamics
 Consumer needs
 Problems and issues consumers discuss
 Product usage discussions
 New product entries & trends in purchase
intention
• Corporate
 Corporate mentions related to reputation
 Crises
 Social issues
• Brand/Product
 Brand/sub-brand mentions, brand “buzz”
 Number of positive vs. negative sentiments for
each brand – including customer service
 Brand content analysis, what’s being said
about brand
 Advertising noticed most and related
discussion – launch tracking
 Source of mentions (specific sites) and the
most influential sites
• Competition
 All the above for preference & competition
9
Market Researchers are fitting SMM into different places within
method or process
•
As a precursor to traditional Market Research
•
Refining hypotheses for research design
•
Prioritising criteria – identifying new ones
•
Defining or qualifying the competitive set
•
Identifying niche respondents for small-scale studies
•
As a successor to traditional Market Research
•
Tracking the impact of implemented findings
•
Monitoring for events which may create discontinuities in this
•
Low intensity/low detail follow-up
•
As a companion to traditional Market Research
•
•
•
•
Compare and contrast – e.g unconditioned
Add granularity to satisfaction drivers
Complement reach
Interpolate lengthy studies
So can SMM research stand alone?
Is there a hierarchy, within these hybrid uses,
of ‘best fit’. Does the story change if you get
longitudinal with a category?
To what extent do some of these uses
assume that the data can be treated like
conventional MR data?
In any case – should it be treated and
analysed thus?
•
10
But inevitably they think about comparison with surveys…
• You can ‘ask a new question’ without
having to issue a new questionnaire*
• Unconditioned by participant awareness of
a research process, often more emotive
than considered survey responses
• Low cost - under certain circumstances
• Spontaneously generated content unconstrained by research frame
• Offers insight into active social media
users
• Potentially
global
• Very immediate
• Not necessarily representative of the
general population
• Difficult to weight back to general
population, as demographic data is sparse
• Automated sentiment analysis only as good
as the algorithms [and these vary greatly]
• Automated harvesting can capture a lot of
‘noise’ for certain words or brands
• No guarantee of sufficient data
• Costs rise when we use supplementary
analysis to overcome some of these issues
*within certain technical limitations
11
Different approaches for different client needs
For example - Precision Extraction vs ‘Trawl & Filter’
Quantitative Brand tracking
and integration
with traditional
research
More post
processing,
applied to
data by GfK
- to reduce
noise and
refine
sentiment
attribution
Indicative Qual
Exploratory Qual –
more complex collection.
Manually manageable
volumes and ‘tuning’
e.g. using trends and
volumes to guide focus
of analysis
Crude
mention &
mood
tracking
Lower data volumes
Higher data volumes
from targeted & compound search terms
from simple search terms
Accept raw
data
output
from
application
12
3. Too Abstract?
13
The raw material - Results from search terms
SMM applications extract results from wholesale supplies of
data, conducting searches defined by “search terms”
• can be anything from a simple and distinctive brand or product
name, to a complex expression configured to capture discussions
about a category or concept
• search terms combine words or phrases
 via logical instructions such as AND, OR, NOT
 by employing functions such as WITHIN to detect words in
a certain proximity to each other
 with brackets that can dictate sequence in which
instructions are applied e.g. “word1” AND ( “word2” OR
“word3” )
14
Typical SMM application offers a dashboard view of data returned by these search terms
– and the facility to export the underlying data
15
Analyses
Whatever the Search Terms define – here is what can be measured about the results
returned… in combination or in isolation
Channels
“where
on the web is it
Volume
Verbatims
being talked about…
“how much is it talked
drill-down to individual
People
twitter, blogs, forums,
about, and how is this
posts, in their own words
changing over time”
Location
“where in the world is it
being talked about?”
“who is talking about it?”
That may be by influence
– according to various
proprietary indices – or
by demographics
[to be used with caution]
comments?”
– “what do people
actually say?”
Themes
“what other words and
phrases are most
regularly associated with
it?”
Sentiment: Across all of these variables is superimposed automatically generated “Sentiment”
analysis – positive, negative or neutral language associated with the subject of the posts…
16
Combinations of these basics tell different types of story
• Brand A’s new ad was mainly discussed on Forums when it was being shot by a
famous pop star, but was mainly discussed on Twitter when it was being aired.
Volume + Channels
• Automotive brand X is associated mainly with topics around performance, whereas
brand Y is associated with comfort and style. Both enjoy roughly the same level of
positive sentiment overall. Themes + Sentiment
• Beverage brand N enjoyed a bigger ‘spike’ in its mentions when news of a future
big game at a sponsored venue was announced, than it got from a tournament
sponsorship that was live at the time. Volume vs. Offline Schedule
• Some ‘general’ social Forum sites enjoy bigger concentrations of discussion of a
particular topic than specialist Forums dedicated to that same topic! Channels +
Themes + People
17
Examples of outcomes from SMM studies
Consumers don’t
always talk about the
product features that
you highlight.
‘The world’ can sometimes
throw up more interesting
stories about you than you could
hope to generate for yourself…
but not always with the
connotations you would like.
Differentiate ‘trade
press’ buzz from real
engagement.
Focus on the right
social media channels
at the right time.
Places where naturally
occurring discussion of a
category offers an opportunity
for brands to ‘intercept’ rather
than try to create competing
social media conversations.
18
BUT!
19
There are many forces which erode this nice model…
Accuracy?
Reach?...................................................
Relevance?
Reach image from titletrack.com
20
Accuracy
Is the searched-for phrase even in the returned “snippet”?
Is it ‘real content’ – or is it
• Navigation?
• Ticker or title content?
• Ad Content?
• Various species of spam [overlaps with ‘Relevance’]?
Is meta-data about the poster
• Present?
• Reliable?
Understanding this, apart from making your own manual checks, is about understanding your 3rd party
vendor and, often, their ‘wholesale data suppliers’ in turn.
21
Reach
[T]here are known knowns; there are things we know that we know.
There are known unknowns; that is to say there are things that, we now know we don't know.
But there are also unknown unknowns – there are things we do not know, we don't know.
Donald Rumsfeld
•
•
•
•
•
•
Are these results from scrutiny of the entire [English speaking] social web
Are they results from a very large, sometimes stated, number of social sources?
Could this range be skewed relative to the subject under scrutiny?
Where it’s Twitter data – is it from the whole of Twitter
Is historical data always the same basis as current data,
or data gathered since the search was defined?
Do we always have a good idea of what the ‘Reach’ is?
No
Yes
Yes
Maybe
Not always
No
22
Relevance
Even when the application has collected exactly what we asked for, and it is legitimate
content, with some nice useful data about the poster… it might not be relevant
“Cats are great company.”
“#EMT Bolt one cool cat!”
“Also, the Cat is a great resort”
“I love my aunt Cat!”
“I think Cat Stark is worse than any Lanister.”
“I think this hurricane was a scam cooked up by the fat cats in Big Grocer.”
23
… put another way
Oh s**t!
I forgot
it’s still the internet.
24
Other challenges include…
However , commencing too early public smoking facts will just
overstress your pet ; quite a fresh pet will not learn everything from
services. Just after he has ended up perched for some a few
moments, supply him with the particular take care of, plus for instance
in advance of, make sure you compliment the pup. When dog house
teaching your dog, continue to keep the dog house in the vicinity of
the spot where you as well as the canine are usually conversing.
25
And I haven’t mentioned automated Sentiment Analysis yet!
Irony – really?
Slang/Dialect/Register
Multiple meanings – “50 strong”
Adjacent subjects – “My beautiful FIAT next to a BMW”
26
4. And what is Good, and what is not Good?
27
To Recap
•
SMM tools make it very easy to “Super Google” certain Brands, people, objects and even
categories or concepts – quickly generating tables and charts.
•
But underneath there’s a complex story about accuracy, reach and relevance… which you
only really see when you drill down… and which you only really understand by getting
inside the provider’s systems and sources.
•
The fact that this isn’t blazoned across all dashboards, is about the fact that many solution
providers started out somewhere else… with monitoring. It’s not that they should have
anticipated our needs.
•
Sentiment analysis is only part of this story – it doesn’t define it.
28
Relationships matter as much as technology
Wholesalers?
Customise
Feeds
Modified
searches
Customise
Engine
3rd party
organisation
Topic-specific
feedback
Dashboard-wielding
Social
Media
Content
3rd Party
System
[e.g.
SaaS]
Reports [inc post hoc
analysis]
MR Agency
FEEDS
“Results”
Clients
Queries and more
refined requirements
“Vendor”
29
Natural Language Processing [NLP] to the rescue?
Definition
“Specifically, it is the process of a computer extracting meaningful information from
natural language input and/or producing natural language output”*
Many SMM applications now claim some level of NLP.
This may legitimately be contrasted with simpler
analysis of vocabulary combinations, and
probabilistic methods, it sometimes means little. It
may only mean that some rules of language have
been ‘attended to’ in what is still essentially a patternmatching exercise
*Warschauer, M., & Healey, D. (1998). Computers and language learning: An overview
30
But clearly sophisticated NLP can make a big difference
• Improved Accuracy – including filtering out of unstructured spam
• More tools available to achieve/check Relevance
• Much-improved Sentiment Analysis
Trends:
• there’s more NLP – not just in social media analysis,
• there’s more commercially affordable NLP and it keeps getting better,
• some of it is even helpfully self-auditing.
Significantly, when NLP is set to retain only high-confidence classifications, volumes of results are
dramatically reduced.
31
Barking up the wrong Tree?
Researchers’ instincts have been to use, and so judge, SMM like survey data.
But “what is good” the ancient philosophers would tell us, is really about
function and
purpose.
I think we’ve now learned enough about SMM to stop and ask..
“what
was it we were trying to do?”
32
Remind me what we are trying to do?
• Use the social web as a proxy for the population?
• Understand how the social web is responding – for
the benefit of those solely interested in this sub-set
of the population as a channel or marketplace?
• Access particularly niches which are more
concentrated online than off?
• Detect significant events?
• Measure shifts and changes?
• Make rough comparisons?
• Discover new insights, themes and connections?
33
Different client needs indicate different SMM approaches
For example - Precision Extraction vs ‘Trawl & Filter’
Not radical
enough!
Sensible
Exploratory Qual – more
complex collection. Manually
manageable volumes and ‘tuning’
Lower data volumes
from targeted & compound search terms
© 2012 GfK NOP
Quantitative Brand tracking
and integration
with traditional
research
More post
processing,
applied to
data by MR
agency - to
reduce noise
and refine
sentiment
attribution
Indicative Qual
e.g. using trends and
volumes to guide focus of
analysis
Crude
mention &
mood tracking
Too much like
hard work? Higher data volumes
from simple search terms
34
Accept
raw data
output
from
application
Rather than wait for NLP utopia…
Settle, for now, on:
1. SMM as a powerful and novel Qual exploration tool
2. Big number crunching, on single terms, that takes a
“hyena” approach. i.e.
Accept all* occurrences of a brand or product name in posts as an indication of
significance… even the ‘trending’ spam and the adverts and the competitions…
Look for pure correlations between words/phrases and other word/phrases…
Or between trends in these numbers and classes of offline events – such as
sales, complaints and other behaviours… with a view to predicting, explaining
or causing such events in the future.
*Except for the most obvious duplication errors such as over-indexing
35
5. Some Concquestions
36
Talking Points
How will commercial SMM applications and services with the best accuracy, reach
and relevance capabilities be recognised, validated and promoted?
If you’re a researcher and you want to use this stuff, for the first time, tomorrow…
what must be done?
Fortunately – there’s enough to learn by “super-googleing”, browsing and crude
trend tracking to keep us going… and learning… for some time to come. Is that,
whilst pragmatic, enough of an ambition?
37
Babita Earle
Dr Nick Buckley
Digital Strategy Director
SoShall Consulting
Tel: 020 7890 9467
E: [email protected]
Tel: 07958 516967 t: @grimbold
GfK NOP
E: [email protected]
38