Multimedia Data Mining - The University of Texas at Dallas

Download Report

Transcript Multimedia Data Mining - The University of Texas at Dallas

Multimedia
Data Mining
Arvind Balasubramanian
[email protected]
Multimedia Lab
The University of Texas at Dallas
Me and My Research
• Research Interests:
–
–
–
–
Machine Learning
Data Mining
Statistical Analysis
Applications of the above in Multimedia
• I am currently working on
– Optimizing index and retrieval structures for human
motion data
– Analysis of Tongue motion data to identify baseline
characteristics of pronunciations (classification,
speech therapy)
Data Mining and Multimedia
• Uncovering hidden information from
data.
• Exploiting data to obtain new
knowledge and interpret results.
• Immense applications in Multimedia.
Data Mining Techniques
• Classification
• Prediction
• Cluster Analysis & Class Discovery
• Extraction and Retrieval
• Statistical Analysis
Ideas for Projects
Text Mining
• Information Extraction from Domain-specific
documents
– involves extracting data from free text pieces and
populating a database
– Serves to organize required information available
in unorganized form
– Not enough in itself; combine with class
discovery
Ideas for Projects
Text Mining
• New Class Discovery using Clustering
techniques
– identifying groups of keywords that do not fall
into known categories
– creating new categories and validating them
– Possibly employ clustering algorithms with proper
similarity measure or distance functions
Ideas for Projects
Text Mining (contd.)
• Query-based document retrieval system
– employ one of several base models such as a
probabilistic model or a vector space model
– design an efficient indexing system
– include relevance ranking feature
– possibly make the system intelligent using
machine learning techniques
Ideas for Projects
Pattern Recognition in Multimedia Data
• Scope
– analyze and identify interrelationships within
Multimedia data sets
– Derive a composite score from several different subscores
• Methods
– classic techniques like Principal Component Analysis
(PCA) and Factor Analysis (FA)
– Statistical methods such as Regression analysis
Ideas for Projects
Pattern Recognition in Multimedia Data
(contd.)
• Methods
– Principal Component Analysis (PCA)
(a) Dimensionality Reduction
(b) Efficient Storage and Retrieval of Media data
(c) Applications in any multi-dimensional media: Images
(noise reduction), Video (content analysis), Audio (Voice
Signature recognition)
Ideas for Projects
Pattern Recognition in Multimedia Data
(contd.)
• Methods
– Factor Analysis (FA)
(a) Minimize data redundancy
(b) Reveal hidden patterns
(c) combining attributes to form a single attribute by
determining the importance and contribution of each
attribute
(d) Medical analysis, IQ tests, Personality tests, Software
measurement, Multimedia content analysis, Motion
Capture Data analysis.
Ideas for Projects
Pattern Recognition in Multimedia Data
(contd.)
• Methods
– Statistical Analysis
(a) Correlation analysis to bring out interrelationships
between data attributes
(b) Regression analysis to analyze the ability of a set of data
attributes to predict other data attributes
Ideas for Projects
Prediction and Suggestion Systems
• An intelligent shopping application or a movie
review application that
– learns from user ratings or purchases, and
suggests other products or options
– Examples: Netflix & Amazon
– Many machine learning techniques could be
employed: Bayesian reasoning and classification
algorithms like Adaboosting
Ideas for Projects
Prediction and Suggestion Systems
• An intelligent media hosting application that
– learns from user queries and requests, and
accordingly suggests other media items
– Suggested items would be retrieved by querying on
the features of the media features and metadata
– Examples: Esnips music hosting
– Many machine learning techniques could be
employed: Bayesian reasoning and classification
algorithms
Ideas for Projects
• Ideas for alternative projects having to
do with applications of machine
learning, data mining and statistical
analysis in the domain of multimedia are
welcome.
• Tools – Weka, Matlab, Statistical software
packages (even Excel helps a lot!!).
Thank You
Arvind Balasubramanian
[email protected]
Multimedia Lab
The University of Texas at Dallas