Not CoveRED because of ILLNESS

Transcript Not CoveRED because of ILLNESS

LIS618 lecture 0
Introduction to the course
Thomas Krichel
2011-04-21
structure
•
•
•
•
me
the way I see it
you
the way you see it.
me
• I am Thomas Krichel.
• My homepage is
http://openlib.org/home/krichel. You can also
use http://wotan.liu.edu/home/krichel, it
contains almost the same contents at all
times.
my courses page
• My courses are at
http://wotan.liu.edu/home/krichel/courses.
• These contain material for all current and
previous editions of all courses that I ran at
the Palmer School.
• I am an open access supporter.
me and LIS618
• In 2003, the course was called “database
searching”.
• Since 2004, it has been called “online
information retrieval techniques”.
• Let me try to clarify both terms.
term “database”
• A database is an organized collection of data
for one or more purposes, usually in digital
form. The data are typically organized to
model relevant aspects of reality (for example,
the availability of rooms in hotels), in a way
that supports processes requiring this
information (for example, finding a hotel with
vacancies).
this is not our database
• The previous definition is not what librarians
mean when they talk about databases, with
the use of the term “database searching”.
• What they mean by “database” is any type of,
usually remote access, resource that the
library has purchased.
• Searching Google is not “database searching”.
searching
• When using the term searching with database
searching we mean the following process
– a user has an information need
– the user formulates a query
– the user is presented with a set of results
• Librarians love searching. Users love finding.
why study database searching?
• There are historical reasons.
• There are pedagogical reasons.
• There are reasons of transparency.
historical reasons
• When libraries first licensed remote content, it
was very expensive and difficult to use.
– The telecommunications charges where high.
– The cost of the system access was high. There
often was a charge by minute.
– The systems were difficult to use. They were not
suitable for a non-trained user.
• Database searching by a librarian is a way to
save cost.
historical reasons today
• The historical reasons don’t seem to apply.
• There are still reasons why you have
intermediated searching.
• One important one is to save the searcher (a
high-salary individual) time and have the
search conducted by someone with a lower
salary.
• A lot of these job are outsourced.
pedagogical reasons
• As librarians, we need to teach people how to
use online information resources.
• Unless they can do this themselves.
• Many (most) think they can.
• The pedagogical reasons seem to disappear
over time.
• There are however serious problems of
transparency.
transparency
• In days of old library databases where
proprietary.
• The engines provided documentation on how
to search contents to a detailed level. The
release of this information did not damage the
business.
• In the days of search engines (the new
“database”) the algorithms to search are
secret.
secrecy in search
• There are some indications has the search
engines give on how they do their work.
• But overall the algorithms are secret.
• The monopoly of Google makes for a serious
threat to the information culture.
• The solution would be to build and operate
open-source engines. I have done some
pioneering but small scale work in this area.
information retrieval
• Deals with how to build systems that allow
users, even untrained, obtain complicated
information.
• This is big business. Google, arguably the most
successful business of the early 21st century,
owes it all it information retrieval.
• In particular, to web information retrieval.
online information retrieval techniques
• This is different from database searching
because we are talking about techniques.
• Successful database requires techniques at
the level of query formulation.
• But it more requires an overall knowledge of
the database, it’s contents, structure.
• This is more the subject of a sources and
services type course.
http://openlib.org/home/krichel
Please shutdown the computers when
you are done.
Thank you for your attention!
not cover
NOT COVERED BECAUSE OF ILLNESS
Proposed Organization
• Normal lecture
• Quiz at the beginning of every lecture
– Factually oriented, around 15 minutes
– Remove worst performance
– Average to form 50%
• Search exercise 50%
• I may make some adjustment to the syllabus
this week.
Search exercise
• Find victim of an information need
• Best to take someone you know in a
professional capacity
• Conduct interview about an information need
experienced by the victim, write down
expectations
• Search in formal database and on web
• Discuss results with the victim
• Write essay, no longer than 5 pages.
about the course
• This course is new wine in an old bottle
• Officially a merger of
– lis566 information resources on the Internet
• mailing lists
• usenet news
• web searching
– lis618 database searching
• access and use of commercial databases
mix of theory and practice
• I am not a database search practitioner.
• Each database is different, practical skills are
not easily transferable.
• Thus my emphasis in the course is more on
theory.
• In the past, I did theory first, then practice.
• These day I mix. Some theory and some
practice in every session.
What online retrieval systems?
• Dialog has been the traditional database
covered.
– They were the market leaders in online databases in
the past.
– Nowadays the field is much more open.
– They remain a very good teaching tool for
command based database searching.
• Nexis: a news database I have covered every
year.
• Google: a well-known search engine that I
started to cover two years ago.
other stuff
• Other online IR systems that I have covered in the
past
– OCLC FirstSearch
– Factiva (briefly)
– WestLaw (external speaker)
• New developments
– Peer-to-peer networks
– an introduction to reference linking using OpenURL
• Old developments with library potential
– relational databases
About me
• Born 1965, in Völklingen (Germany)
• Studied economics and social sciences at the
Universities of Toulouse, Paris, Exeter and
Leiceister.
• PhD in theoretical macroeconomics
• Lecturer in Economics at the University of
Surrey 1993 and 2001
• Since 2001 assistant professor at the Palmer
School
Why?
• During research assistantship period, (1990 to
1993) I was constantly frustrated with difficult
access to scientific literature.
• At the same time, I discovered easy access to
freely downloadable software over the
Internet.
• I decided to work towards downloadable
scientific documents. This lead to my library
career (eventually).
Steps taken I
• 1993 founded the NetEc project at
http://netec.mcc.ac.uk, later available at
http://netec.ier.hit-u.ac.jp as well as at
http://netec.wustl.edu.
• These are networking projects targeted to the
economics community. The bulk is
– Information about working papers
– Downloadable working papers
– Journal articles were added later
Steps taken II
• Set up RePEc, a digital library for economics
research. Catalogs
– Research documents
– Collections of research documents
– Researchers themselves
– Organizations that are important to the research
process
• Decentralized collection, model for the open
archives initiative
Steps taken III
• Co-founder of Open Archives Initiative
• Work on the Academic Metadata Format
• Co-founded rclis, a RePEc clone for (Research
in Computing, Library and Information
Science)
• Currently working on the Konz project. It uses
a database of titles of journal published papers
and tries to find them on the Internet.
my interest in databases
• an important emphasis of course is still on
commercial databases.
• From my point of view I have two interests in
database searching
– As a provider, I must understand how people
search in order to provide some data that they can
use and will use.
– As an economist, I have a strong interest in
information as a commodity. The database market
is an important market place.
online information retrieval
• This subject can be though off as a subset of
information retrieval (IR). Most IR is online or
digital.
• IR concentrates on textual data.
• We can think of online IR to fall under two
categories
– database IR
– web IR
database / web IR
• Database IR look at systems that have
– controlled set of record
– low heterogeneity
– use requires authentication
– advanced search features
• Web IR has opposite characteristics
traditional social model
• User goes to a library
• Describes problem to the librarian
• Librarian does the search
– without the user present
– with the user present
• Hands over the result to the user
• User fetches full-text or asks a librarian to
fetch the full text.
economic rational for traditional model
• In olden days the cost of telecommunication
was high.
• Database use costs
– cost of communication
– cost of access time to the database
• The traditional model controls an upper limit
to the costs.
disintermediation
• With access cost time gone, the traditional
model is under threat
• There is disintermediation where the librarian
looses her role of doing the search.
• But that may not be good news for
information retrieval results
– user knows subject matter best
– librarian knows searching best
Web searching
• IR has received a lot of impetus through the
web, which poses unprecedented search
challenges.
• With more and more data appearing on the
web DS may be a subject in decline
– It is primarily concerned with non-web databases
– There is more and more web-based methods of
searching
Public access vs quality
• Now the public at large is able to do online
searching.
• At the same time need for quality answers has
grown.
• Quality-filtered services will become more
important.
• In the current databases, there is as lot that would
already be available for free mixed with qualitycontrolled stuff.
• Publishers have direct offerings and intermediated
vending is in decline.
http://openlib.org/home/krichel
Thank you for your attention!

Not CoveRED because of ILLNESS

Transcript Not CoveRED because of ILLNESS

Directory