05_H. Kwasnicka_Some issues on mining data and

Download Report

Transcript 05_H. Kwasnicka_Some issues on mining data and

Some issues on mining data
and content based images
retrieval
Halina Kwasnicka
[email protected]
www.ii.pwr.wroc.pl/~kwasnicka
Outline
page2 of 18
Some issues on data mining
 Evolutionary approach to KDD process
 Grouping data as a hierarchy of objects
– The proposed method
– The proposed measures
Intelligent analysis of images
 CBIR and images similarity
 Medical images analysis
Speech Processing and NL Engineering
Reserch questions and possisble applications
EC & KDD – the meeting places
page 3 of 18
FEATURE SELECTION / CONSTRUCTION
MICHIGAN & PITT APPROACHES; HIERARCHICAL
APPROACH; GENETIC INDUCTIV LEARNING;
ASSOCIATION & CLASSIFICATION RULE GENERATING
– ANTLIA system
DATA VISUALIZATION
METHOD – GENRED –
the universal matrix is
generated, it can be
used then more
vectors are added
Hybrid techniques: GA+FL & GA+NN
Data clustering - hierarchy of objects
page 4 of 18
Only the leaves contain
objects, other nodes
define the hierarchy
No relationship between
groups
Inheritance: object belonging to
a given group also belongs to
the parent groups
Retention: objects do not need
to be located in the tree's leaves
Variance: objects in the child
groups are more specific than in
the parent
The method (IRV-HC, in progress)
page 5 of 18
Is inspired by „Tree-Structured Stick Breaking
for Hierarchical Data” (R.P. Adams, Z. Ghahramani, M.I.
Jordan)
Is based on Bayesian inference Markov
Chain Monte Carlo
Its current version works on real-valued data
Challenges:
 Appropriate measures of clustering methods
reflecting the requirements
 Projection of the hierchy of clusters onto a given
ontology
Proposed a new measure
page 6 of 18
Hierarchical Class Purity – an external
measure
 Is based on Class Purity: each cluster is
assigned to the class which is most frequent in
the cluster, and then the accuracy of this
assignment is measured by counting the
number of correctly assigned objects and
dividing by a number of all objects
 It is possible to redefine the class purity in a
way that recognizes the relationship between
classes, such that the child can be assigned
only class of the parent node or descendant.
Proposed a new measure…
page 7 of 18
Variance Deviation – an internal measure
 A measure of the deviation of diversity examines the
relationship between the diversity of the data contained
in the child node as compared to the data in the parent
node
 Variety of collection of data must be non-negative
 A collection containing one element must have a zero
diversity
 If the collection contains two non-identical elements the
diversity must be greater than zero
 Multiplication of elements in the collection does not affect
the diversity of the collection
An example of such measure may be the
standard deviation
Image auto-annotation
page 8 of 18
Data set:
Query image:
Questions:
 what words should be assigned to query image
 how many words the annotation should have
The hypothesis
page 9 of 18
Images similar in appearance are likely to
share the same annotation
http://www.visible.ii.pwr.wroc.pl/
PATSI – one of the proposed method
page 10 of 18
The schema of possible approach
Medical images analysis
page 11 of 18
Analysis of Computer Tomography images
in dementia and tumor diagnosis
Analysis of capillaroscopy images in
evaluation of health threats
Analysis of the histopathology images in
cancer diagnosis
Speech Processing and NL Engineering
page 12 of 18
In human supervision systems
 Practical applications:
– Patient supervision in healthcare,
– Elderly people monitoring,
– Children monitoring.
Aim:
 To use acoustic signal captured by remote
microphones to support:
– determination of the supervised person
state and
– detection of emergency situations
Example Footer
Speech Processing and NL Engineering
page 13 of 18
In question answering and documents search
 Practical applications:
– Automatic question answering systems in call
centers
– Speech support for services in mobile systems
used in hands-free regime (e.g. for drivers,
medical doctors etc.)
– Speech controlled search in medical
document databases and Internet-available
resources
– Help systems for patients and disabled persons
Example Footer
Speech Processing and NL Engineering
page 14 of 18
In question answering and
documents search
 Aim:
– To create tools supporting
speech controlled systems for
question answering that are
tolerant to typical errors
occurring in ASR and produce
well-formed queries on their
output
Example Footer
Summary: some of the research
problems
page 15 of 18
Use soft computing (e.g., evolutionary
computations) in data mining and knowledge
acquisition tasks
Accurate and efficient method of grouping object
hierarchy that reflects the semantics of objects
with the ability of projection onto suitable ontology
Separation of speech from other sounds
Speech recognition in the case of slurred speech
in changed emotional states
Recognition of cough attacks, cry, sounds specific
to breathing problems
Recognition (by voice) of persons entering the
supervised room
Summary: potential commercial applications
– some of the planned themes
page 16 of 18
Patient supervision in healthcare
Elderly people and/or children monitoring
Computer system of qualitative and
quantitative diagnosis of the expression of
HER-2 receptor and other membrane
proteins in the breast carcinomas
Decision support systems for diagnosis
System for searching similar documents /
images by visual query, e.g., to search similar
monuments and information about them in
tourism
Summary: potential commercial
applications – previous works
page 17 of 18
Computer system for medical speech
analyzing (description of medial images), the
system was presented in Wroclaw TV
The Computer system to help education of
children with dyslexia (we made a prototype)
Orator – an intelligent system supporting
stuttering therapy
VCOP – The System of Visual Communication
with Computers for Paralyzed Users (working
prototype)
Thank you for your attention!