Final Presentation

Download Report

Transcript Final Presentation

Goal recap
 Implementation
 Experimental Results
 Conclusion
 Questions & Answers

Our goal is to implement framework, to
predict network traffic by mining
mainstream news articles
 Method

› Latent Dirichlet Allocation (LDA) identifies and
classifies popular topics in articles

ISP can query and pre-cache highly
popular videos to reduce overall traffic
and delay



Implemented a python program to parse the
news articles and collect the title and content
Original LDA implementation processed
random Wikipedia articles, we modified it to
pass and process news articles.
Wrote a script to extract and store YouYube
statistical data such as, view-counts, number of
subscribers, YouTube ID’s, date of upload, user
profile data, etc.

Wrote and implemented a program to sort
topics by popularity , we pick most popular
topics and compare it with news websites
› Popular news websites (such as CNN, BBC)
generate popularity chart over time by clickview data

Implemented the ZOOM Operation
› Wrote a program to distribute the articles by
sources/category
› Query words using frequent pattern mining and
LDA results to check relevancy and accuracy of
popular topics
1
0.9
0.8
0.7
0.6
LDA+FP
0.5
OSLDA
0.4
0.3
0.2
0.1
0
10
20
40
80
100
(X axis) # of feeds VS (Y axis) Video relevance to the topic
1
0.9
0.8
0.7
0.6
LDA+FP
0.5
OSLDA
0.4
0.3
0.2
0.1
0
10
20
30
40
50
60
70
80
90
(X axis) # of feeds VS (Y axis) Accuracy of selecting video with most traffic
100

Online LDA alone accurately chooses the most popular topic around 57%
of the times using 1k articles. With 100k articles it is around 91%
accurate. The blue line is the accuracy using both Online LDA and
frequent pattern mining. With 1k articles the accuracy is around 92%.
Using 100k articles the accuracy close to 100%.

When using only Online LDA there is only around a 60% chance the
selected video will be relevant to the actual topic when using 10k articles.
When using 100k articles the probability rises to about 87%. When using
frequent pattern mining and Online LDA there is around a 94% chance the
video selected is relevant using 10k articles. With 100k the probability is
close to 100%.

From these results we conclude that using Online LDA combined with
frequent pattern mining we will be able to predict popular topics from
mainstream media and identify relevant videos from video portals with
high accuracy
Thank you
 Q&A!!
