Titel - KU Leuven
Download
Report
Transcript Titel - KU Leuven
How to teach LOD?
Bettina Berendt
Dept. Computer Science
KU Leuven
1
Who am I ?
Privacy,
Discrimination
2
Research: One persisting question
3
Research: One specific question –
How do blogs and tweets spread, change, create news?
5
Workshop series (with, a.o., Markus L.-R.)
[…] synergy between semantics and semantic-web technology on the one
hand, and the analysis and mining of usage data on the other hand. […]
First, semantics can be used to enhance the analysis of usage data.
Second, usage data analysis can enhance semantic resources as well as
Semantic Web applications; traces of users can be used to evaluate, adapt or
personalize Semantic Web applications.
The emerging Web of Data demands a re-evaluation of existing evaluation
techniques: the Linked Data community is recognizing that it needs to move
beyond triple counts because real value of Web data needs to be measured
by real use.
6
Another persisting question:
Data Mining for Information Literacy
8
Data Mining for Information Literacy:
How?
9
“Knowledge and the Web“ course:
Curricular context
Based on experiences in HU Master specialisation
„Wirtschaftsinformatik“
•
2007-2012: KUL Master specialisation „Databases“
•
Students: mostly Computer Science students NOT specialising in
databases
2013+: KUL Master specialisation „Artificial Intelligence“ (+
Master AI)
•
Students: Wirtschaftsinformatik, Computer Science, miscellaneous
Students: we‘ll see!
6 ECTS
Student numbers over the years: between ~ 6 and ~ 20
10
… & a big thanks to the teaching assistants!
Ilija Subašić
Thomas Peetz
11
Concept of the course
3 blocks:
•
•
•
Web data, integrating Web data
Mining Web data
Applications and implications
Lecture + exercise session
+ mini-workshop at the end
One invited talk
Evaluation based on homeworks
•
Progression from „exercise“ to „self-defined project“
12
Lecture 2012
Lecture 2013
& more depth about the other topics
13
Homeworks
1. Modelling
2. Populating
3. Integrating
4. (optional) Data mining basics
5. (a non-graded exercise) Reporting on data mining projects /
Reading data mining papers
6. Your own project
14
Semantic Web / LOD intro topics
The Semantic Web: Motivation and overview
Very brief recap of XML (& why it’s not semantic)
RDF and RDFS
OWL and ontologies
Linked (Open) Data (LOD)
Storing, accessing and combining SW data
15
Inference topics
Introduction / motivation; kinds of reasoning
Properties of Properties (cf. the Pizza Tutorial)
Class descriptions, cardinality, & value constraints
Does this type of knowledge exist in LOD?
Common problems in using OWL reasoning
16
Schema / ontology matching topics
Core ideas of federated databases
The match problem & what info to use for matching
(Semi-)automated matching: Example CUPID
(Semi-)automated matching: Example iMAP
Ontology matching, with Example BLOOMS
Evaluating matching
Involving the user: Explanations; mass collaboration
17
Identity, inconsistency, provenance topics
Introduction: The promise and risks of openness
Identity crises: owl:sameAs
Inconsistencies and provenance
18
Privacy topics: (1) Preparatory questions
What does privacy mean for you concretely? Can you remember situations
where it was important for you to show yourself in a different way than you
are? Do you expect such situations in the future?
Privacy also involves the possibility of lying. Is this possibility a right? Give
concrete examples and discuss them.
Think of a case where someone would want to not disclose some
information and where you would think "this is not right". Does this person
claim their privacy? Would your desired outcome be a privacy violation?
Who do you think should be watched most closely when it comes to
handling personal information: the government? companies? anyone else?
why?
So what does privacy mean for databases and data mining? What problems
would you like to see addressed?
(Questions from/inspired by Martens, B., Dierick, G., & Noot, W. (2008). Ethiek en weerbarheid in de
informatiesamenleving, Uitgeverij LannooCampus, Leuven & Academic Service, Den Haag, p. 75)
19
Privacy topics (2): Lecture agenda
Three types of privacy
… and how the law respects them
Societal conventions that allow for secrecy
Surveillance, democracy, and …
Whose privacy?
and: when privacy is traded off against other goods
“Data aggregation and record linkage”
Trackers and anti-trackers
20
Homework 1: Modelling
21
Homework 2: populating
22
Homework 3: integrating
23
Homework 4 (optional): Basic data mining
24
Reading exercise (1)
(from Justin Zobel: Writing for Computer Science)
25
Reading exercise (2)
4. Now consider the guidelines for structuring a data-mining exercise
from the CRISP-DM model and manual. A good description of a
data-mining project will contain sections on each of the main phases
in CRISP-DM.
5. Please identify and highlight passages in the paper you have read
that correspond to those phases.
26
Homework 6: Your own project (1)
This homework is your final project for this course. It will take you
through much of what you learned throughout the semester, and
result in a small yet genuine data mining project.
With the proposal you have sent us, and the feedback you got
during the discussion, you by now have a clear idea of what you will
be doing. If you run into any problems that you cannot solve in a
reasonable amount of time, please contact us as soon as possible.
This homework consists of two parts.
27
Homework 6: Your own project (2)
Sharing your data The first three homework sets [… a reminder of what this was …]
Any scientific work is only as good as its reproducibility. If you report the results of data
mining without disclosing the data used, you are asking the reader for blind faith. In
order to make data mining meaningful, its data sources must be available for followup work. Your first task is to do precisely this. Describe the ontology that you have
built and specifically the subset of it that you are using for this project. in terms of its
purpose, its schema, and basic statistics about its entities. Important questions
include the following.
Note: You may copy the answers to these texts from your previous homework sets if they
are there already. We mention them again here so you can critically check and, if
applicable, extend what you have written earlier.
Where exactly did the data originate from?
Are there any problems with these sources? (Example: Do the creators of the source
follow a political agenda, only listing Muslims as terrorists? Are you even allowed to
redistribute the data?)
What is the overall schema of the ontology?
How did you map and match the ontologies/schemas you found? Which strategies
did you use? Which problems arose?
Which attributes are guaranteed to exist for members of the most important classes?
Which attributes may exist, but are not always present?
How many individuals do have these attributes?
Which decisions did you make for selecting the subset of data you are working with
for Homework 6 from the “full” ontology you built? For example, did you select
classes, instances, attributes? Did you aggregate attributes? If so, how?
28
[…]
Homework 6: Your own project (3)
Data mining In the second step, you perform the project that you prepared so
far. A good report will include the following:
A very clear description of the research question you seek to answer.
A good motivation for this research question.
A critical review of the data, especially if you can expect it to contain the
answer to your research question.
A precise description of your experiments and their validation, with a
motivation for the chosen setup. The reader must be able to obtain the
exact same result, so they need to know every single parameter.
A discussion of the results, given the data review and the experiments
discussion.
A conclusion that gives an answer to the research question.
A list of things you would have liked to do, but didn’t due to time constraints.
Do not forget to carefully evaluate the results of the experiments, using
whatever metric is applicable (significance, confidence, accuracy,
precision/recall, etc.) in order to supplement the qualitative assessment of
the experiments.
29
Homework 6: Example topics from 2012
(Terrorism) and 2011 (Twitter)
Relation between oil and war
The relation between politician, his
country, and terrorist attacks
Predict attack type and victim type
for new organizations
Converting tweets from mobile
speed controls into an historical
overview on a map
Where should I go on vacation
based on recent tweets?
Seasonal sentiment analysis in
tweets (data sets: Libya, Syria)
30
What‘s good: students …
like the course
… are surprised
participate very actively
get hands-on experience
are creative!
reflect on data and on methods
obtain insights
•
E.g. from goal: predicting who‘s a terrorist
to goal: finding correlations between a country‘s military
expenditure, level of schooling, and incidence of terrorism
31
What‘s not so good / challenges (1)
Prerequisites
•
•
To be able to interpret the results properly, would need
o Proper background in statistics
2013+: better given students with more DM background?
o Background knowledge about the application area
Idea for 2013+: tailor the Invited Talk more closely to the project
To be able to make more of Semantic Web reasoning, would need
o More background in logics
Idea for 2013+: interface more closely with parallel logics course
Didactical method: Capacity limitations and „cue-based learning“?!
•
•
•
Breadth vs. depth …
Practical learning tends to overtake theory learning
Difficult to integrate background reading with project (easier for Twitter
than for terrorism)
32
What‘s not so good / challenges (2)
Die Mühen der Ebene (“the difficulties on the ground“) in data
handling and analysis
•
•
•
Sparsity of data and lacking empirical regularities are frustrating
Preference for mashups vs. Data integration?!
Laborious data preparation is boring and time-intensive
33
Outlook: Next possible student-project field?
ParlBench
An LOD of Dutch
parliamentary proceedings
(Tarasova & Marx, Proc.
USEWOD/BerSys 2013)
See also (Juric, Hollink, &
Houben, Proc. DeRIVE
2012)
OR
Use a similar, but not yet
semantified, Flemish
dataset
Dutch language: + and –
34
Outlook: curricular changes in 2013+, 2014+(?)
FROM mandatory course in the specialization „Databases“, taken
largely by a non-database, heterogeneous audience
TO optional course in the specialization „Artificial Intelligence“,
presumably taken by a more homogeneous, largely AI audience
The 6-ECTS course can also be taken, as a 4-ECTS course, by
students in the Master of Artificial Intelligence, with
•
•
the Web mining option (focus on modelling and mining)
the Web data fusion option (focus on modelling and integrating)
To be supplemented by a data course in the Master Digital
Humanities (currently under review)
•
Chance of joint projects in which expertise can be pooled
35
Outlook: sharing
36
37
Der titel
Bla der text
•
•
Dflkjfd
o Dsflkjdsf
Eraelkj
text
text
Erlajeklj nmnm
Text
text
[Quelle, XXX]
38
Noch ein Titel
Jkljklllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll lllllllllllllllllllllllllllll
llllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllll llllllllllllllllllllllllllllllll lllllllllllllll llllll lllll l
lllllllll
39