幻灯片 1 - Harold Liu

Download Report

Transcript 幻灯片 1 - Harold Liu

PROJECT
Topics




Theoretical:
 Error Performance Analysis for Partitioned Sketch Data
Structures
Survey:
 Security and Privacy for Big Data: A Survey and Future
Directions
Experiments:
 Citizen Behavior of 7-21 Storm in Beijing, 2012
 Music Knowledge Mining
 Hadoop for Video Streaming on the Web
 MapReduce Jobs For Video Conversion
Your proposed one…
1. Error Performance Analysis
for Partitioned Sketch Data Structures




We talked about the time complexity already (in terms of
update time)
TASK:
 What about error performance?
 How to optimally allocate the depth of each sketch (zipfian)?
Start to learn from how CM sketch analyzes its error
performance (Theorem 1 and alike)
 http://dimacs.rutgers.edu/~graham/pubs/papers/cmfull.pdf
Learn about P(d)-CU
 http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=
6574663
How to determine this?
Result
Analysis (e.g., mathematical derivations)
 Some initial simulation (correctness)

2. Survey



Write a good survey in English on
 Security and Privacy for Big Data: A Survey and Future
Directions
Cite at least 40+ references (IEEEXplore and ACM Digital Lib)
Paper organization
 Classify these works in different categories, from different
angles
 Extensive comparisons
 Identify future directions (i.e., what are the missing
pieces?)
Some Materials







http://www-03.ibm.com/security/solution/intelligence-big-data/
https://ssl.www8.hp.com/ww/en/secure/pdf/4aa4-4051enw.pdf
http://www.emc.com/collateral/industry-overview/big-data-fuelsintelligence-driven-security-io.pdf
http://www.isaca.org/Groups/Professional-English/bigdata/GroupDocuments/Big_Data_Top_Ten_v1.pdf
http://www.trendmicro.com/cloud-content/us/pdfs/business/whitepapers/wp_addressing-big-data-security-challenges.pdf
http://scholarlycommons.law.northwestern.edu/njtip/vol11/iss5/1/
Think about:




Storage
Analysis
Applications
Cloud, Internet-of-Things
3. Analyze Citizen Behaviors
of 7-21 Storm in Beijing, 2012






The Power of Social Networks and Public Crowd
http://v.youku.com/v_show/id_XNDM5NjY1Mzc2.html
Using social network APIs like Sina Weibo
 open.weibo.com/wiki
Use the keyword search to retrieve all related data
#望京人赴机场免费救援# ,#双闪车队# (100+)
菠菜X6,@望京网
4. Music Knowledge Mining



Million Song Dataset
 http://labrosa.ee.columbia.edu/millionsong
For Example: to calculate music density
 http://musicmachinery.com/2011/09/04/how-toprocess-a-million-songs-in-20-minutes/
YOUR TASK: Predict which songs a user will listen to
 http://www.kaggle.com/c/msdchallenge
5. Video Streaming on the Web






Store your video as chunks in HDFS
Case: user suddenly move to a specific part of the video
Seek in the file to position the cursor at a specific location
HDFS can only be accessed through a Hadoop client, Apache
server is not.
Apache/FUSE: all file system operations (dir browsing, file
opening and content access) are enabled over HDFS content
through the FUSE interface.
http://internetmemory.org/en/index.php/synapse/using_had
oop_for_video_streaming/
Result

A demo
 Choose a least 1 type of video format (e.g., flv)
 A client to play video
 A web server (with Apache FUSE)
 HDFS to store your videos
6. MapReduce For Video Conversion
Convert huge number of video files from one format
to another.
 using the open source video converter FFMPEG
(http://ffmpeg.org/download.html).
 Data stored on HDFS
 Create an app doing it (running on Google AppEngine)

Mechanism





Working in group: 3-5 students, clear roles
Email me ([email protected]) by this Friday (Nov 22)
 Team leader, Team members
 Topic
Deadline: 28 December 2013!
Deliverable: project report in Chinese
 Introduction (motivation, WHY?)
 Related Work (What others have done)
 Your proposal (HOW?)
 Performance Evaluation
 Conclusion
Presentation
Suggested Arrangement
Week-1: Define your roles and start literature
research
 Week-2 and 3: Propose solutions
 Week-4 and 5: Implementation and obtain results
 Week-6: Write report
