konf08kliegr.ppsx

Download Report

Transcript konf08kliegr.ppsx

Event reports
Tomáš Kliegr
•
•
•
•
EDBT 2008
KDD 2008
ECML/PKDD 2008
SSMS 2008
EDBT 2008
European Conference on Database
Technology
Nantes, Francie
Příspěvek - PhD Workshop
• Vyší acceptance rate než v minulých letech
(asi 46%)
• Dimensionality Reduction of Semantically
Enriched Clickstreams
• Vynikající zpětná vazba – 5 obsáhlých recenzí
• Velmi široký záběr témat
• Postproceedings: Rozšířená verze
konferenčního příspěvku publikována v ACM
DL
Vybrané workshop příspěvky
• Improving the Accuracy of Entity Identification through
Refinement
– the goal of entity identification is to correctly identify all the instances
of the same entity so as to eliminate the inconsistency of data sources
during data integration.
• Full-text indexing and Information Retrieval in P2P Systems
– Distributed IR
• Reasoning about Taxonomies and Articulations
– This work formalizes taxonomies and relationships between them as
formulas in logic. This formalization concretizes notions such as
consistency and inconsistency of taxonomies and articulations (intertaxonomic relations) between them, enables the derivation of new
articulations based on a given set of taxonomies and articulations and
provides a framework for testing assumptions about under-specified
taxonomies.
Další čeští účastníci
• A Cost-based Join Selection for XML Twig
Content-based Queries
• Radim Baca, Michal Kratky
Zaměření konference
• P2P
• XML
• Streaming
• Caching
• Query Processing
• Data Fusion
Industrial section
• Data Challenges at Yahoo!
– Ricardo Baeza-Yates and Raghu Ramakrishnan
• Automatic Content Targeting on Mobile
Phones
KDD 2008
14th ACM SIGKDD International
Conference, Las Vegas
Příspěvek- MDM KDD Workshop
• Combining Image Captions and Visual Analysis
for Image Concept Classification
– Kliegr, Svátek, Nemrava, Chandramouli, Isquierdo
• Pro zajímavost, na stejném workshopu v
minulosti publikoval Pavel Praks:
Multimedia Data Mining Workshop (Pavel
Praks’05): Iris Recognition Using the SVD-Free
Latent Semantic Indexing
Zajímavé příspěvky z workshopu
• Annotating images and image objects using a
hierarchical Dirichlet process model
– We apply this model for predicting labels of objects in
images containing multiple objects. During training,
the model has access to an un-segmented image and
its caption, but not the labels for each object in the
image. The trained model is used to predict the label
for each region of interest in a segmented image.
• Mining the Web for Visual Concepts
– Relevance feedback on Image + text data retrieved
from the web
Zajímavé příspěvky z konference
• Building Semantic Kernels for Text
Classification using Wikipedia
– In this paper, we overcome the shortages of the
BOW approach by embedding background
knowledge derived from Wikipedia into a
semantic kernel, which is then used to enrich the
representation of documents.
• Entity Categorization Over Large Document
Collections
– In this paper, we significantly improve the
accuracy of entity categorization by (i) considering
an entity’s context across multiple documents
containing it, and (ii) exploiting existinglarge lists
of related entities (e.g., lists of actors, directors,
books).
ArnetMiner: Extraction and Mining of
Academic Social Networks
• Extracting researcher profiles automatically from the Web;
2) Integrating the publication data into the network from
existing digital libraries; 3) Modeling the entire academic
network; and 4) Providing search services for the academic
network. So far, 448,470 researcher profiles have been
extracted using a unified tagging approach. We integrate
publications from online Web databases and propose a
probabilistic framework to deal with the name ambiguity
problem. Furthermore, we propose a unified modeling
approach to simultaneously model topical aspects of
papers, authors, and publication venues. Search services
such as expertise search and people association search
have been provided based on the modeling results.
Heterogeneous Data Fusion for
Alzheimer’s Disease Study
• In this paper, we propose to integrate
heterogeneous data for AD prediction based
on a kernel method. We further extend the
kernel framework for selecting features
(biomarkers) from heterogeneous data
sources
• Experimental results show that the integration
of multiple data sources leads to a coniderable
improvement in the prediction accuracy.
Febrl – An Open Source Data Cleaning, Deduplication
and
Record Linkage System with a Graphical User Interface
• Freely Extensible Biomedical Record
• Linkage)
• It contains many re-cently developed
techniques for data cleaning, deduplication
and record linkage, and encapsulates them
into a graphi-cal user interface (GUI).
• https://sourceforge.net/projects/febrl/
Using tagFlake for Condensing Navigable
Tag Hierarchies from Tag Clouds
• Luigi Di Caro (University of Torino)
K. Selçuk Candan (Arizona State University)
Maria Luisa Sapino (University of Torino)
Pictor: An Interactive System for
Importing Data from a Website
• demonstration of an interactive wrapper induction system, called Pictor, which is able to
minimize labeling cost, yet extract data with high
accuracy from a website. Our demonstration will
introduce two proposed technologies: recordlevel wrappers and a wrapper-assisted labeling
strategy. These approaches allow Pictor to exploit
previously generated wrappers, in order to
predict similar labels in a partially labeled
webpage or a completely new webpage.
Trendy
• Text minig
• Advertising on web
• The 2nd International Workshop on Data
Mining and Audience Intelligence for
Advertising (ADKDD 2008)
• Medical datamining
– Workshop on Mining Medical Data and KDD Cup
2008
Further highlights
• The 2nd SNA-KDD Workshop on Social
Network Mining and Analysis (SNA-KDD
2008)
• Workshop on Mining Medical Data and KDD
Cup 2008
• The 2nd International Workshop on Mining
Multiple Information Sources
ECML/PKDD 2008
European Conference on Machine
Learning and Principles and Practice of
Knowledge Discovery in Databases
Antwerpy
Příspěvek na WBBT Workshopu
• Wikis, Blogs and Bookmarking tools workshop
• Chair: Bettina Berendt
• Wikipedia As the Premiere Source for Targeted
Hypernym Discovery
Tomas Kliegr, Vojtech Svatek, Krishna
Chandramouli, Jan Nemrava and Ebroul Izquierdo
• http://www.kde.cs.unikassel.de/ws/wbbtmine2008/pdf/all_wbbtmine2
008.pdf
Vybrané invited talks
• The Role of Hierarchies in Exploratory Data
Mining
– In a broad range of data mining tasks, the
fundamental challenge is to efficiently explore a very
large space of alternatives. The difficulty is two-fold:
first, the size of the space raises computational
challenges, and second, it can introduce data sparsity
issues even in the presence of very large datasets. In
this talk, well consider how the use of hierarchies
(e.g., taxonomies, or the OLAP multidimensional
model) can help mitigate the problem.
Learning Language from Its
Perceptual Context
• Raymond J. Mooney
• The training data consists of textual human
commentaries on Robocup simulation games. A set of
possible alternative meanings for each comment is
automatically constructed from game event traces. Our
previously developed systems for learning to parse and
generate natural language (KRISP and WASP) were
augmented to learn from this data and then
commentate novel games. The system is evaluated
based on its ability to parse sentences into correct
meanings and generate accurate descriptions of game
events.
Watch, Listen & Learn: Co-training on
Captioned Images and Videos
• leverage the text that often accompanies
visual data to learn robust models of scenes
and actions from partially labeled collections.
Our approach uses co-training.
Co-training
• semi-supervised learning algorithm that
requires two distinct “views” of the training
data
• First learns a separate classifier for each view
using any labeled examples
• The most confident predictions of each
classifier on the unlabeled data are then used
to iteratively construct additional labeled
training data.
SSMS 2008
3rd Summer School on Multimedia
Semantics, Chania, Crete
Přehledové prezentace
• http://www.meship.eu/ssms08.aspx?Page=ssms08
• Prezentace možno stáhnout
Wrap up
• ECML 09
– Bled, Slovenia
– 7 Sep 2009 - 11 Sep 2009
• KDD 09
– Paris, France
– Jun 28-Jul 1, 2009
• EDBT/ICDT 2009
– Saint-Petersburg, Russia
– March 23-26