01CS 257_0_ Int_pro1 - Department of Computer Science

Download Report

Transcript 01CS 257_0_ Int_pro1 - Department of Computer Science

CS 257
Database Systems
Dr. T Y Lin
Ultimate Goal
Data Science (Big Data)
CS 257- OverView

CS257 and Big Data
+: VLDB (Very Large Database )
+: Unstructured Data, i.e. Text/Web
 Image, Multimedia, Video, Vision
 Bio, Scientific Data Processing
Light: Cloud Computing
Light: Data Science /Knowledge Engineering

etc
CS 257- OverView

Major Applications in Big Data

Medical Informatic


VLDB + Image +Cloud + Security (CS286)
Financial Informatic

VLDB + BI + Cloud + Security (CS286)
Web Engineering
 Business Intelligence(BI)
 Data Science (Knowledge Engineering in
Web/Image/Bio/etc Data)

CS 257- OverView





Instructor:
IEEE Best Contribution Award in Data Mining
(ICDM 2001)
ACM/IEEE Best Service Award Web Intelligent
(WI-2007)
Best Contribution Award Rough Set (2005)
Pioneer Award in Granular Computing (2008)
CS 257- OverView
http://dl.acm.org/inst_page.cfm?id=60015609
Project Overview
Verification and Validation of the
Core Engine of
a Concept Based
Semantic Search Engine
6
Main Idea
A set of documents is associated with a
Matrix, called
1) Latent Semantic Index(LSI) , by treating
the row vectors as points in Euclidean
space (point=TFIDF),
- Google’s approach
7
Main Idea
2) Topological approach : A polyhedron
(combinatorially, = a Simplicial Complex)
is built to capture and structure the concepts
8
An open segment is a 1-simplex, an open triangle (faces) is a 2-simplex and an
open tetrahedron is a 3-simplex, and . . . n-simplex.
A collection of simlexes (satisfies closed condition) is called simplicial complex
that is a combinatorial representation of a polyhedron that led to a “new” subject
called algebraic topology. The project is algebraic topology based search engine.
9