Title of Project - People Server at UNCW

Download Report

Transcript Title of Project - People Server at UNCW

1
Class 4 Outline
• Introductions
• Customer feedback
• Where does the data come from?
• Why is this analysis even more important today?
• Questionnaires with ratings
• The Likert scale
• How to visualize the results
• Are the results significant?
• Free text responses
• What can we do with it?
• Sentiment analysis
• Count words and phrases
2
Introductions
Curry Guinn
• Hokie (or Fighting Gobbler)
• Blue Devil
• Seahawk
• 10 years at RTI International
(in Research Triangle Park)
3
Customer Feedback
• Where do we get the data?
• How has this changed over the years?
• Who gets to see customer feedback?
• Why is it even more important to
understand and respond to customer
feedback now?
4
Styles of Data Collection
• Questionnaires with Ratings
• These rating scales are
often called Likert-style
questionnaires
• Free text responses
5
Likert Scaling
A group of judges rates each item on a scale where:
1=strongly disagree
2=disagree
3=undecided
4=agree
5=strongly agree
Visualizing the Results of Likert-Like Surveys
• To compare across questions, column (or bar)
charts with either mean, median, or mode.
• You can use bar charts to compare across
products or services too.
• To compare responses within a question,
histograms.
• Let’s see examples of these in our exercise.
7
Column Chart Example – Single Product
(Exercise 1)
Supra Sedan
5
4.5
Customer rating (1 to 5)
4
3.5
3
2.5
2
1.5
1
0.5
0
Supra Sedan
Overall
Satisfaction
Price
Performance
Attractiveness
3.4
2.2
4.2
4
8
Bar Chart Example – Multiple Products
(Exercise 2)
Supra vs. Mega
5
4.5
Customer Rating (1-5)
4
3.5
3
2.5
2
1.5
1
0.5
0
Overall Satisfaction
Price
Performance
Attractiveness
Supra Sedan
3.4
2.2
4.2
4
Mega SUV
3.50
4.33
2.08
1.92
9
Histogram Example – Examining a Single
Response (Exercise 3)
10
What is the true average?
• In our Supra Sedan, the overall satisfaction
had a mean of 3.4.
• That result was calculated with 10 customer
responses?
• How certain are we that the “true” mean is
3.4?
• Would the result be exactly the same with
100 customer responses? What about 1000?
11
Standard Deviation
• What standard deviation (and the related
statistic standard error) tell us is what is the
likely range of the true mean.
• We say that there is a 95% likelihood that the
true mean is within twice the standard
deviation .
• This gives us a level of confidence in our
results.
12
Standard Deviation
• If the standard deviation is big, then the
likelihood that we have calculated a mean
that is close to the real mean is lower.
13
Why is this important?
• Often we want to know whether the
difference between two means is significant.
• For instance, suppose some customers rate
Product 1’s overall satisfaction at 3.4 and
Product 2’s satisfaction at 2.9.
• Should we tell our company that customers
prefer Product 1 over Product 2?
• These means are just an estimate of the real satisfaction.
• So, is there a chance that customers really don’t prefer
Product 1 to Product 2?
• The answer is yes!
• Fortunately, we can calculate that chance.
14
Here’s the visual
15
Using a T-Test to determine whether two sets of
data are significantly different
• A t-test can return the probability that two
sets of data have the same mean
• If that probability is low (say 5%), then you
can have some confidence that the statistical
means are truly different.
• Exercise 4:
•
•
Is the difference in customer’s opinion of “Price” significant?
How about attractiveness? Overall satisfaction?
16
What are some problems with Likert-style
surveys?
• Social Desirability – Respondents often
answer trying to put themselves (or others)
in a positive light.
•
Real world example I just encountered: Internship
Supervisor Survey.
• Question: Give your overall rating of the intern:
• Poor Fair Average Good Excellent
• No one was rated “Poor” or “Fair”
• How to interpret that?
• Bimodal distributions – Relatively common
• T-tests assume a normal distribution
• Continued on next slide
17
Bimodal distribution
• Answers tend to cluster around two different
“means”
• Real world example:
• Student grades in my programming
classes
• Where else might you expect to see a
bimodal distribution?
• Controversial issues
18
Be careful looking at the mean
• Suppose we have a 7-point scale where 1
means “Hate it!” and 7 means “Love it!”. 4
means “Neutral”.
• You ask 100 people their opinion of
Obamacare and you calculate a mean of 4.
• Does that mean that people are “Neutral” on
Obamacare?
• The histogram is important!
• Exercise 4.5: Looking at a bimodal
distribution.
19
Let’s Take a Break
When we come back …
Working with Free-Text Responses
20
Text Analytics
• What can we do with all the free
form text that that comes from
sources like:
•
•
•
•
•
Customer Satisfaction Surveys
Amazon.com
Tripadvisor.com
Angie’s List
Social Media
– Facebook
– Twitter
21
Tools for Text Analytics
• Goals:
– Use the structure and regularities
inherent in language to extract useful
information
– Communicate that information
22
Our Domain for this Example
• 784 customer reviews
of a particular Linksys
router (Source: Amazon.com)
– Cisco-Linksys WRT160N Wireless-N
Broadband Router
• What are some things
we can do?
– Sentiment analysis
– Topic identification
23
What is sentiment analysis?
Determine whether some text is basically
“positive”, “negative” or “neutral” in affect
24
Tools for Sentiment Analysis
• Pre-defined dictionary of words and phrases
annotated with sentiment information
– Example: Finn Arup’s annotated list of words
– Advantage: Accuracy
– Disadvantages: Poor coverage, not sensitive to how
vocabulary is used in a domain, sarcasm
• Advanced: Use machine learning techniques to
enable computer to learn words and phrases
associated with particular sentiments
– Advantages: Can be sensitive to domain; good
accuracy
– Disadvantages: Requires corpus for analysis
25
Exercise 6: Using a VBA macro that analyzes sentiment
• This macro was written by Mike Alexander
• Uses Arun Frup’s list of affect words
• Documentation: http://datapigtechnologies.com/blog/index.php/quantifyingsubjective-text-with-sentiment-analysis/#more-5356
• Example Spreadsheet:
http://www.datapigtechnologies.com/downloads/Text_To_Sentiment.xlsm
26
Consolidated Sentiment Analysis Can Be Misleading
• The following was identified as being roughly
neutral:
Had a real headache to set it up but after it was
installed it works just fine. I think the
instructions could have been written better.
• Clearly, we need to identify topics within a
review AND
– Perform sentiment analysis on these subtopics
27
Tools for Topic Identification
• Word and phrase counts
• Advantage: Can be done using Excel tools
• Disadvantage: Does not automatically
combine related words/phrases
• Pre-defined topic vocabularies
• Advantage: Accurate
• Disadvantage: Must pre-defined categories
• Cluster analysis: Computer figures out categories
• Advantage: Computer does the work
• Disadvantage: Not supported well by Excel
28
Exercise 7: Word counting in Excel
• Using Pivot Table – Cool!
29
Advanced Tools
• A number of commercial products exist for
performing more advanced text analysis
•
•
•
•
Semantria https://semantria.com/
Repustate https://www.repustate.com/
Text2Data http://text2data.org/
QI Macros http://www.qimacros.com/
are examples
• To see what they can do, let’s look at
Semantria
30
Exercise 8: Semantria
31
Text Analytics
• With sufficient data, the customer feedback statistics can
provide meaningful and actionable data
• Performs best with substantial data
• Vulnerable if the amount of data is low
• But if the amount of data is small, you probably wouldn’t
be using automated tools anyway.
32