ISMIR2006_ResponseRates_Queries_poster - Music

Download Report

Transcript ISMIR2006_ResponseRates_Queries_poster - Music

FACTORS AFFECTING RESPONSE RATES FOR
REAL-LIFE MIR QUERIES
Jin Ha Lee
M. Cameron Jones
J. Stephen Downie
Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
Data and Analysis
Introduction
Data: 2,208 real-life music-related queries collected from
the Google Answers website’s music category.
Project: This poster reports on research conducted as
part of the Human Use of Music Information Retrieval
Systems (HUMIRS) project.
Features Examined: (1) if the query was answered or
not; (2) the elapsed time in minutes between when the
query was posted and when it was answered; (3) the
number of content words in the query (i.e., stop words
removed); and (4) the price offered for the answer.
Objective: To improve our understanding of the factors
related to a query being answered.
Analysis: First, we examine how the proportion of
queries answered varies over time. Then we compare
selected features of answered and unanswered queries,
namely the price offered for the answer and the length of
the query, in order to understand if these variables affect
the probability of a query being answered.
Statistical Analysis: A logistic regression model
describes how the probability of a query being answered
depends on the explanatory variables. We test if that
probability is independent of variable X by the likelihoodratio test. A well-fitting model is significant at the .05 level.
Results
Price
Difference in the mean price:
(Δµ = 0.097)
The increase in
the proportion of
queries answered
drops rapidly
over time.
Logistic regression model:
logit(π) = - 0.079 + 0.000X
The query price has no
statistically significant effect on
the probability of getting an
answer (likelihood-ratio test
statistic = 0.013; df = 1; p = 0.908).
Difference in the mean query
length: (Δµ = 4.553)
Query Length
Response Rate and Time
Total number of queries: 2,208
Number of answered queries: 1,062
Number of unanswered queries: 1,146
Overall probability of a query being answered ≈ 0.481
More than half of
all answered
queries (51.69%)
are answered
within 2 hours of
being posted and
83.71% within the
first 24 hours.
Discussion
Possible Explanations:
Some questions are either impossible or difficult to
answer given the query statement, regardless of the price.
Users may also need more words to adequately describe
their information needs for difficult questions that are less
likely to get answered.
As users provide more information they may also increase
the chance that they are providing incorrect information
which may result in search failure.
Logistic regression model:
logit(π) = 0.099 - 0.006X
The word count has a statistically
significant effect on the
probability of getting an answer
(likelihood-ratio test statistic =
14.342; df = 1; p = 0.000), but the
substantive difference is very
small.
Conclusion and Future Work
Conclusion: Our results show that price has no effect
and adding more words has a negative effect, although
minimal, on the likelihood of a query being answered.
Future Work: A qualitative analysis of the queries to
determine which of the answered queries were answered
correctly and also assess the level of accuracy of user
provided information features for answered queries.
Special Thanks to: The Andrew W. Mellon Foundation and the
National Science Foundation (Grant No. NSF IIS-0327371)