information policy in general

Download Report

Transcript information policy in general

LIS510 lecture 8
Thomas Krichel
2006 -11-08
introduction
• Reading
– Rubin chapter 2.
– Rubin chapter 4 until page 153
– Library of Congress "Copyright Basics",
available at
http://www.copyright.gov/circs/circ1.html
• Structure
– information science
– information policy
Taylor’s 1966 definition
“Information science is the science that
investigates the properties and behavior of
information, the forces governing the flow of
information, and means of processing
information for optimum accessibility and
usability. The processes include the
originations, dissemination, collection,
organization, storage retrieval and use of
information.”
Rubin’s organization
1. Information needs, information seeking,
information use and information users.
2. Information storage and retrieval.
3. Defining the nature of information and its
use.
4. Bibliometrics and citation analysis
5. Management and administrative issues.
?. new areas
JITA classification
• This is about the only publicly available
library and information science
classification scheme
http://eprints.rclis.org/jita.html
• It was done for the E-LIS system of Library
and Information Science (LIS) eprints at
http://eprints.rclis.org.
area 1: information needs
• Much of this literature waffles vaguely about
imprecise concepts.
• “Information seeking in context” is now
popular.
• Some broad trends
– People prefer personal to institutional sources.
– People seldomly see librarians as a source.
– People make little effort.
Berrypicking (Bates 1989)
• Users sift through information like pickers of
berries.
– The query is constantly shifting.
– Users may move through a variety of sources.
– New information may give people new ideas
and direction
– The value of information is all the bits and
pieces gathered during the process.
• This contrast sharply with information
retrieval research.
Kuhlthau (1991)
• Proposes a 6-stage information seeking
process
– Initiation
– Selection
– Exploration
– Formulation
– Collection
– Presentation
• IMHO perfectly useless.
area 2: information retrieval
• Information storage is not really much of an
issues anymore.
• When I dealt with it I meant storage as
including the organization of the
information, which is a bit of a stretch
• Ideally, one needs to know the retrieval
needs before designing the organization of
the information
information retrieval
• This has to do with anything of how the
user gets to the information out of an
information system.
• It is different from data retrieval since the
retrieved data has to be “relevant” to the
user.
• It is very difficult to say what “relevance” is,
objectively.
typical research
• This usually involves looking at a set of
documents that have been classified.
• Then we can pick computer algorithms that
best sort the documents satisfying the user
need from those who don’t.
• Usually this stuff is heavily
mathematical/computational.
• I have been applying work from that area.
information retrieval performance
• How was it for you?
• The traditional methods are
– precision = number of relevant documents
retrieved divided by total number of retrieved
documents
– recall = number of relevant documents
retrieved divided by total number of relevant
document.
• They only evaluate a search!
• I have done some work in that area.
information retrieval models
• They give formal account of the retrieval
process.
• there are three basic flavor
– Boolean information retrieval
– Vector information retrieval
– Probabilistic information retrieval
• All are mathematical model
• I would also add web information retrieval
as a new type
web information retrieval
• This has become big business now
because finding a user’s need is a way to
connect them with advertising.
• One way that has made Google such a
success is that they discovered a way to
make quality web sites appear at the top.
• Basically, a quality web site is one that has
many links to it from other quality sites.
information storage
• It can mean the preparation of information
before searching
– which fields are searchable
– can there be a variety of means to rank
searches?
– is there use of a controlled vocabulary
• It is difficult to make general conclusions
but to say that advanced search features
are not much used.
human-computer interface
• Tries to understand how users work with
computer systems.
• The idea is to build “user-friendly” systems.
• But don’t leave that to a “computer
designer” as suggested by Rubin.
• Note that information systems go way
beyond computers.
• This area is usually connected to
psychology.
natural language processing
• Rubin classifies this as a part of computerhuman interface.
• Natural language processing is still in its
infancy.
• Speech recognition is the best developed
part.
• Others are working on connecting
computers to the brain.
artificial intelligence
• This has been around for a while.
• The field has developed a number of
theoretical tools.
• Some of them are being used in practice
now. Things like RDF, the Resource
Description Framework, are based on
artificial intelligence theory. It is a tool to
aggregate knowledge from web resource.
• Still no practical application that
demonstrates the use of AI on the web.
Area 3: defining information & its
value
• There is debate on the nature of
– data (Thomas: things that can be processed in
the information system)
– knowledge (Thomas: stuff that is in people’s
head)
– information (something between data and
knowledge). Rubin says its meaning given to
data.
• Rubin also talks about wisdom as
“knowledge applied for the benefit of
humanity”
scientific view of information
• Usually information is modeled as
something that reduces uncertainty
• People have a rough idea about something,
say tomorrow’s temperature
• The information is the fact that this
something will actually take a precise value,
when we know what the temperature is or
when we have less uncertainty.
• There is an approach to measuring
information through the concept of entropy.
• Thomas used to teach such stuff.
value of information
• Economists can use a probabilistic model
we can set out an approach that puts value
to information.
• But their definition is useless for practical
purposes.
• Much of the work then involves some
cost/benefit analysis. In such analysis one
can reach almost any result one wants.
elements of value-added in libraries
•
•
•
•
•
•
•
•
•
access to resources
accuracy (for example of bibliographic data)
browsing (like in library stacks)
currency (things are up-to-date)
flexibility (through human interaction)
formatting (laying out the collection, signs)
interfacing (probably close to flexibility)
ordering (buy access to things)
access to means to get to resources
area 4: bibliometrics
• Is the application of quantitative methods to
the study of information resources
• Mainly concerned with the structure of the
resources. The typical example is citation
analysis.
• Quantitative studies of use fall more to the
first area of interest.
• An expanding area is the use of network
analysis.
bibliometric laws
• Zipf’s law related to the usage of terms in
text.
• Lotka’s law related to the number of papers
written by authors.
• Bradford’s law relates to the distribution of
articles in a field across a number of
periodicals.
citation analysis
• This is the heart of bibliometrics.
• Two important concept
– bibliographic coupling means two documents
share some reference
– co-citation means two documents are cited by
the same documents
• Citation analysis is also important for
scientific activity evaluation.
area 5: management & admin
• This is an expanding area in libraries.
• Rather than collecting physical books,
libraries have to negotiate on-line access.
• Area covers all of information policy.
Example problems are
– copyright
– censorship
• Measuring performance is part of user
studies
service evaluation
• This is an important area is libraries.
• Libraries need to demonstrate value in
order to fight for their continued existence.
• They also need to examine usage of the
systems that the vendors propose.
area 6: information architecture
• art and science of organizing information
and its interfaces so that seekers find what
they want quickly
• mainly used with respect to large web sites.
it looks at the contents rather than technical
factors or the look-and-feel
• A related idea is usability
area 7: knowledge management
• this comes from the business environment
• it is a management fad that has overstayed
its welcome.
information policy
• This is any
– law
– regulation
– practice
• that affects the
– creation
– acquisition
– evaluation
• of information.
-- organization
-- dissemination
private value of information
• Information has value for its creators.
• Some creators require that you pay them in
order to use that information.
• US law encourages the private creation of
information and knowledge.
• There is market for information.
limiting access to information
• The creators of commercial information
providers are concerned about unpaid
access.
• Other companies, that do not primarily
produce information may also be
concerned about leaking of data such as
– R&D data
– financial data
– product information
protecting privacy
• This is a major issue in society in general.
• Financial, health and other data are
protected by law.
• In libraries, the concern has been the
protection of circulation records.
• The Patriot Act has created fairly loose
conditions under which law enforcement
agencies can access circulation records.
freedom of information
• This refers to the idea that government
information other than
– military secrets
– law enforcement records
– private medical and financial information
• Government information should be made
available to the citizens so they can
scrutinize government.
• This should be an important task for public
libraries with respect to local government.
private dissemination of public
information
• There has been a tendency away from
giving the distribution of government
documents by the government printing
office to private companies.
• This has caused some as the companies
charge the taxpayers for something that
has already been produced at the
taxpayers’ expense.
• Such companies can copyright the
information that the government could not.
example: legal information
• In principle, text of laws and legal
information should be free
• Some old data of it still is in print form and
can not be circulated without some cost
• Recent data could all be made available on
the web.
• The judicial system does not organize
upload of data and organization of data
well.
national security
• Protecting the cyber infrastructure has been
made a priority. But nothing much is there
that the government can directly do to
protect private installations.
• There have been some restrictions on the
distribution of formerly government data
that has been considered to provide
information on potential terror targets.
the library awareness program
• The FBI started to monitor the use of
libraries by foreign individuals in the 70s.
• Libraries were believed to be places where
foreign agents could get critical intelligence
to gain a technological edge.
• Since the material held in libraries is
published and sold commercially, it seems
quite silly to monitor its use.
the Patriot Act
• some knee-jerk legislation to try to protect
the USA
– increases power to monitor citizens behavior
– authorizes roving wiretap
– intelligence authorities can require any
“business record”
• the person concerned in the record must not be
informed
• there is a gagging order to disseminate information
about the request
• no independent judicial review of the request
control of expressions
• There has been a long history of
censorship in all countries at all times.
– IRA/Sinn Fein
– sexually explicit material
• Sometimes the pressure on artistic works is
indirect, e.g. through funding channels.
• Libraries generally fight censorship, but
they have to keep their target communities
in mind.
http://openlib.org/home/krichel
Please shut down the computers now.
Thank you for your attention!