幻灯片 1 - Harold Liu
Download
Report
Transcript 幻灯片 1 - Harold Liu
PROJECT
Topics
Theoretical:
Error Performance Analysis for Partitioned Sketch Data
Structures
Survey:
Security and Privacy for Big Data: A Survey and Future
Directions
Experiments:
Citizen Behavior of 7-21 Storm in Beijing, 2012
Music Knowledge Mining
Hadoop for Video Streaming on the Web
MapReduce Jobs For Video Conversion
Your proposed one…
1. Error Performance Analysis
for Partitioned Sketch Data Structures
We talked about the time complexity already (in terms of
update time)
TASK:
What about error performance?
How to optimally allocate the depth of each sketch (zipfian)?
Start to learn from how CM sketch analyzes its error
performance (Theorem 1 and alike)
http://dimacs.rutgers.edu/~graham/pubs/papers/cmfull.pdf
Learn about P(d)-CU
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=
6574663
How to determine this?
Result
Analysis (e.g., mathematical derivations)
Some initial simulation (correctness)
2. Survey
Write a good survey in English on
Security and Privacy for Big Data: A Survey and Future
Directions
Cite at least 40+ references (IEEEXplore and ACM Digital Lib)
Paper organization
Classify these works in different categories, from different
angles
Extensive comparisons
Identify future directions (i.e., what are the missing
pieces?)
Some Materials
http://www-03.ibm.com/security/solution/intelligence-big-data/
https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf
http://www.emc.com/collateral/industry-overview/big-data-fuelsintelligence-driven-security-io.pdf
http://www.isaca.org/Groups/Professional-English/bigdata/GroupDocuments/Big_Data_Top_Ten_v1.pdf
http://www.trendmicro.com/cloud-content/us/pdfs/business/whitepapers/wp_addressing-big-data-security-challenges.pdf
http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/
Think about:
Storage
Analysis
Applications
Cloud, Internet-of-Things
3. Analyze Citizen Behaviors
of 7-21 Storm in Beijing, 2012
The Power of Social Networks and Public Crowd
http://v.youku.com/v_show/id_XNDM5NjY1Mzc2.html
Using social network APIs like Sina Weibo
open.weibo.com/wiki
Use the keyword search to retrieve all related data
#望京人赴机场免费救援# ,#双闪车队# (100+)
菠菜X6,@望京网
4. Music Knowledge Mining
Million Song Dataset
http://labrosa.ee.columbia.edu/millionsong
For Example: to calculate music density
http://musicmachinery.com/2011/09/04/how-toprocess-a-million-songs-in-20-minutes/
YOUR TASK: Predict which songs a user will listen to
http://www.kaggle.com/c/msdchallenge
5. Video Streaming on the Web
Store your video as chunks in HDFS
Case: user suddenly move to a specific part of the video
Seek in the file to position the cursor at a specific location
HDFS can only be accessed through a Hadoop client, Apache
server is not.
Apache/FUSE: all file system operations (dir browsing, file
opening and content access) are enabled over HDFS content
through the FUSE interface.
http://internetmemory.org/en/index.php/synapse/using_had
oop_for_video_streaming/
Result
A demo
Choose a least 1 type of video format (e.g., flv)
A client to play video
A web server (with Apache FUSE)
HDFS to store your videos
6. MapReduce For Video Conversion
Convert huge number of video files from one format
to another.
using the open source video converter FFMPEG
(http://ffmpeg.org/download.html).
Data stored on HDFS
Create an app doing it (running on Google AppEngine)
Mechanism
Working in group: 3-5 students, clear roles
Email me ([email protected]) by this Friday (Nov 22)
Team leader, Team members
Topic
Deadline: 28 December 2013!
Deliverable: project report in Chinese
Introduction (motivation, WHY?)
Related Work (What others have done)
Your proposal (HOW?)
Performance Evaluation
Conclusion
Presentation
Suggested Arrangement
Week-1: Define your roles and start literature
research
Week-2 and 3: Propose solutions
Week-4 and 5: Implementation and obtain results
Week-6: Write report