CSE5334 Data Mining
Download
Report
Transcript CSE5334 Data Mining
CSE6339
DATA MANAGEMENT AND ANALYSIS FOR
COMPUTATIONAL JOURNALISM
CSE6339, Spring 2012
Lecture 1: Introduction
Department of Computer Science and Engineering, University of Texas at Arlington
©Chengkai Li, 2012
Course Page
http://crystal.uta.edu/~cli/cse6339
Syllabus,
Schedule (lecture notes), Resources,
Accommodation based on disability.
Course announcements will be made
through BlackBoard and email.
http://www.uta.edu/blackboard/
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
2
Basics
Lectures: Tue/Thu, 2:00pm-3:20pm, GS 109
We may use a meeting room instead. (TBD)
Reschedule?
Instructor: Chengkai Li
Office hours: Tue/Thu 11am-12pm, 3:30-4:30pm, ERB 652
Contact: cli [at] uta.edu, (817) 272-0162 (I don’t check voice mail)
TA: ?
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
3
Essence
Project-Driven: You will do a semester-long project and mainly evaluated by it. You
are expected to build a prototype system and demo it at the end of the semester.
Required reading:
The reading materials will be chosen based on the project topics that you choose
to do.
Materials listed on schedule page. Mostly research papers.
You are required to read the papers before the class. That’s crucial because the
lectures will emphasize discussions.
Research course and exploratory by nature:
No Textbook. Most questions we’d like to discuss do not have textbook answers.
The papers are not for giving you instructions on doing your projects.
The projects are not simply implementing something that you are told to
implement.
Be curious and be willing to learn, think, explore, and innovate.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
4
Tentative Grading Scheme
Paper Review
20%
Paper Presentation
20%
Class participation (attendance and discussion)
10%
Course Project
50%
No homework, quiz, or exam.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
5
Paper Review 20%
Paper reviews
Students are required to complete the reading assignments before the
lecture. Reviews are required for a subset of the papers (marked on
schedule page.)
Deadline: 11:55pm, the night before the lecture.
Each review should discuss the following. Suggested length is 500-800
words:
Take-home message from the paper, i.e., a very brief summary of what
the paper studies/discovers.
What do you like about the paper.
Several things that you don’t like about the paper.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
6
Paper Presentation 20%
After the initial several lectures
Study one paper (sometimes more) in each lecture.
One student will present the paper.
Each student makes 2 presentations.
Sign-up in blackboard (instructions will be given later).
Presentation slides:
Deadline: 11:55pm, the night before the lecture.
Should be carefully designed.
Cover 80 minutes.
The presentation should be interactive: present the papers, raise
questions, and moderate discussions.
The more discussions/debates, the better.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
7
Class participation 10%
Attending the class is mandatory.
Students are expected to actively participate in discussion.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
8
Course Project 50%
Be prepared to get hands dirty.
In teams or individually (each student should contribute to a team project
evenly).
Several stages:
P1: Project Proposal
P2: Progress Report
P3: Final Report (in the format of a research paper), presentation and
demo.
Sample project topics will be provided.
Will be research-type and exploratory.
You are encouraged to propose your own topic.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
9
BlackBoard
http://www.uta.edu/blackboard/
Announcement
Student assignment submission (we don’t accept
email submission or hard-copy)
Presentation
slides
Review
Project
deliverables
Grades
Discussion
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
10
Deadlines
Everything will be submitted through BlackBoard.
Due time: 11:55pm
Late submission: 5-point deduction per hour, till you
get 0. (The raw score of each assignment is 100. So
there is no point to submit it after 20 hours).
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
11
Regrading
7 days after we post scores in BlackBoard.
We usually won’t change your review score, since its
grading is subjective by nature, unless unfair grading
is obvious.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
12
Where to find papers
Google
Google Scholar
DBLP Bibliography
Services through UTA Library
http://library.uta.edu/JDBC/DBs/dbejournal.jsp
ACM Digital Library
IEEE Xplore
Other Computer Science articles
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
13
Get bored?
Do you watch Youtube?
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
14
http://www.youtube.com/watch?v=gC2ew6qLa8U
http://www.youtube.com/watch?v=463gKcXDVzQ
Don’t do it. It’s not worth it.
We are very serious about this.
read & sign the statement
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
15
Specific to this course
Paper review:
Must be written by yourself. If you use materials from other sources (e.g.,
Wikipedia, other papers), you must cite the source, at every place that the
material is used. Reviews violating the rule constitute plagiarism.
Paper presentation:
It is ok to use slides that you find elsewhere. Make sure it is high-quality and
make sure to acknowledge the source.
Project:
Must be done by yourself.
It is ok to use other libraries and packages. Actually you are
encouraged to do so.
If you use source codes from others, you must document it in your report.
The reports should cite various sources when applicable.
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
16
Topics potentially related to the
projects
Data Cube and OLAP
Data Mining, Text Mining
Query processing
Natural language querying of databases
Web databases
Data visualization
User interface
Cloud computing
Crowdsourcing
Social computational systems
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
17
Self Introduction
Chengkai Li
http://ranger.uta.edu/~cli
Research interests:
databases, Web data management, data mining, information
retrieval
Projects:
Computational Journalism
Database Testing
WebEQ (Querying and Exploring the Web)
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
18
Now it’s your turn
o
Name, program/year, where from
o
Background related to:
o
Journalism
o
Database
o
Text Mining, Data Mining, Machine Learning
o
Informational Retrieval
o
Web technologies
o
User Interface, Visulization
o
Cloud computing
o
Social networks
o
Why do you want to take this course?
o
What do you want to get from the course?
o
What would make you like/hate this course?
Lecture 1: Introduction
CSE6339 Computational Journalism, Spring 2012
UT-Arlington © Chengkai Li, 2012
19