Transcript Document

ECT 250: Survey of e-commerce technology
Searching, images, frames, and markup
languages
Searching the WWW
• Exploring the Web can be very time-consuming.
• Search engines and directories enable you to locate
relevant web pages more quickly and efficiently.
• A search engine is software that allows you to type
in keywords. The engine scans a database of
Web pages and displays a list of pages that meet
your criteria.
• A directory organizes Web pages into categories.
You can click on appropriate categories until you
find a Web page that matches your chosen topic.
2
Search engines/directories
• Altavista (http://www.altavista.com)
• Excite (http://www.excite.com)
• DirectHit (http://www.directhit.com/)
• Fast Search (http://www.ussc.alltheweb.com/)
• Go (http://www.go.com)
• Google (http://www.google.com)
• HotBot (http://www.hotbot.com)
• Northern Light (http://www.northernlight.com)
• Yahoo (http://www.yahoo.com)
• Web Crawler (http://www.webcrawler.com)
3
Naïve searches
• A single keyword search can yield thousands
of sites, many of which are irrelevant.
Example: A search for climbing yields
2,400,000 hits.
• Multiple keywords can help.
Example: Illinois, Wisconsin, climbing yields
only 32,500 hits.
• To save time and effort it pays to construct a
more sophisticated search that will yield fewer
hits with a higher percentage of relevant pages.
4
Searching tips
• Use a directory to find information on a general
topic. Use keywords in a search engine for
specific information or narrow topics.
• Use the searching tips to construct a precise query.
• Use multiple, specific keywords and synonyms.
• Use advanced search features to make your query
more focused.
• Try multiple search engines/directories or use a
meta-search engine (e.g. DogPile).
• Use a specialized search engine (e.g. Business
search engine)
5
Advanced search options
• Special operators (and, or, not, near)
• Search for phrases, not just keywords
• Domain specific searches: include or exclude
pages based on their domain
• Specify the language of the search
• Page specific searches: pages that link to or are
similar to a given page
• Give a bound on the most recent update
• Specify whether the site contains images, audio,
or visual information
Example: www.google.com
6
Limitations
Search engines examine only a fraction of the web
pages available on the World Wide Web.
A study released in 1998 estimated that the best
engines indexed only 33% of the publicly indexable
Web. The 1999 follow-up study found the coverage
had decreased to only 16%.
More important are the techniques used by the search
engine in ranking and updating pages.
7
Loading efficiency
• Most Web pages contain graphical images to
add interest, make navigation easier, and to
convey necessary information.
• Most Web users will wait only a short time for
a page to load, so efficiency considerations
are important.
8
Graphic formats
• Graphic formats are usually referred to by their file
extensions, such as .tif, .bmp, .gif, .jpg, and .png.
• Web page images are commonly in either the .gif
.jpg, or .png format.
• Graphic formats are usually compressed. File
compression can either by lossless, which does
not decrease image quality, or lossy, which does
lose image quality.
9
GIF
• The Graphics Interchange Format (GIF) is the
standard format for Web page images and is
supported by all browsers that display images.
• It is an efficient, compressed format that allows
up to 256 colors. It uses lossless compression.
• GIF images are always rectangular, but a
transparent background can be used to make
the images appear to be non-rectangular.
• GIF images can be interlaced, which means that
the image is displayed initially at low resolution
and its quality is increased as it downloads.
10
JPEG
• The Joint Photographic Experts Group (JPEG)
format is supported by most browsers that
display images.
• JPEG images use lossy compression. The amount
of compression ranges from 0% to 100%. The
higher the compression, the smaller the file size
and the lower the image quality.
• JPEG cannot be made transparent, but it can be
specified as a progressive JPEG, which is loaded
the same way as an interlaced GIF.
11
PNG
• The Portable Network Graphics (PNG) format is
a new(ish) format created for Web page images.
• It is expected that it will eventually replace GIF.
• PNG images use a lossless compression that is
more efficient than GIF.
• It can use a color palette of 256 colors or less like
GIF or support true color like JPEG images.
• PNG images can be interlaced and transparent.
12
Selecting a format
• The GIF or PNG format is usually used for line
art such as clip art, logos, etc.
• JPEG is chosen for photographs because true
color is desirable and selecting the amount of
compression can result in smaller sized files.
• One approach is to save an image in several
formats and choose the one with the smallest
file size that produces acceptable quality.
13
Size considerations
• GIF, JPEG, and PNG images are all bitmapped
formats, which means that the images are
made of a rectangular grid of pixels.
• Web images are measured in pixels.
Example: 500 x 55
• Do not make images too wide. Images that do
not fit into a single screen will force scrolling.
• For efficiency considerations, you may choose
to create a thumbnail image. This is a smaller
version of an image that allows a preview of
the picture. Example: LLBean
14
Frames
Frames allow more than one Web page to be
displayed within the browser window at a time.
When frames are used, the page opened in the
browser is a special page containing instructions
about how the browser window is to be divided
into separate regions and which page should be
initially displayed into each region. This special
page is called the frames page or frameset.
15
Navigating with frames
When frames are used, clicking on a link in one
frame can:
• Change the contents of that frame
• Change the contents of a different frame
• Display a page without using the frames page
An application of frames is for a table of contents
or a navigation bar. Frames allow the contents or
navigation bar to be visible at all times.
16
Examples
Sites that use frames:
• Macromedia: www.macromedia.com
• National Discount Brokers: www.ndb.com
• XSL Tutorial:
http://www.zvon.org/xxl/XSLTutorial/Books/Book1/index.html
• A personal page: Jim Jacobson
Some sites that do not use frames:
• Amazon: www.amazon.com
• DePaul CTI: www.cs.depaul.edu
• Gap: www.gap.com
• NY Times: www.nytimes.com
17
Frames: good or evil?
There is a significant controversy about whether
the use of frames is a good or bad thing.
What are some of the issues surrounding frames?
For a longer discussion of some of the issues see:
• Aren’t frames bad?
http://www.gooddocuments.com/techniques/areframesbad.htm
• Web design: frames – good or bad?
http://dionaea.com/web/frames.html
18
Some problems with frames
• Search engines do not deal well with frames
• Printing becomes more difficult
• Saving pages is more complicated
• Creating browser bookmarks may not work
• Frames can require large resolution
Why use frames at all?
19
Benefits of frames
• Navigation can be easier
• Easier updating of pages
Many of the problems given on the previous page
are technology issues. Once a solution is found,
frames may become more attractive.
Example: MS IE 5.0 supports frames better than
previous versions.
20
Conclusions about frames
• Use frames only when the benefits outweigh the
disadvantages.
• Tables or shared borders can be used instead of
frames to place a navigation bar, table of
contents, or other item on the edge of the page.
• Frames have become much less popular at large
web sites.
21
Markup languages
• FrontPage is an HTML editor.
• HTML stands for hypertext markup language.
• It is an example of a markup language.
• Historically markup has described annotations
and handwritten notes found on manuscript
pages that tell a typist how a particular page
should be laid out or typeset.
• Electronic markup languages are marked with
tags to govern the display, formatting, and
organization of text elements.
22
Three markup languages
Three markup languages are of particular interest:
1. SGML (Standard Generalized Markup Language)
is the parent language from which the other two
are derived. It is a meta language used to define
other markup languages.
2. HTML (Hypertext Markup Language)
3. XML (Extensible Markup Language) is another
descendent of SGML. It defines data structures
important for a wide range of data exchange
activities.
23
HTML
An HTML document contains both document
content and tags.
• The content consists of all the information that
appears in the browser window, including
text, graphics, and video.
• Tags are the HTML codes that specify how a
the document should be formatted.
Example:
http://facweb.cs.depaul.edu/asettle/
24
HTML tags
• Each HTML tag is enclosed in angle brackets.
• Two-sided HTML tags come in pairs.
The general form of a two-sided tag is:
<tagname properties>Content</tagname>
The opening tag is <tagname properties>.
The closing tag is </tagname>.
• Some HTML tags are one-sided, requiring
only the opening tag.
• Tags are not case-sensitive.
25
Types of tags
There are a large number of tags. Some examples:
• Document tags: specify the parts of the document
such as the heading, title, body.
<title></title>, <html></html>
• Text structure tags: determine the layout of the
text found in the body of the document.
<h1></h1>, <p></p>, <br>
• Style tags: specify how text will be shown by the
browser. <center></center>, <em></em>
• Image tag: <img src=“name” other-attributes>
• Anchor tag: <a href = “URL”></a>
26
The meta tag
Search engines catalog sites by following links from
page to page and saving identification information
for each page visited.
The main HTML element that interacts with search
engines is the Meta tag.
Using the Meta tag you can list information about
your page that allows a search engine to better
classify the contents of your page.
27
Attributes of the meta tag
The Meta tag has two attributes that should always
be used:
1. The Name attribute identifies the type of Meta
tag you are including.
2. The Content attribute provides information the
search engine will be cataloging about your site.
Example:
<Meta Name = “keywords” Content = “algorithms,
complexity, quantum, information, retrieval,
kolmogorov, security, arrays, cryptography, faculty,
combinatorics”>
28
History of HTML
• HTML 1.0: Introduced in 1991 by Berners-Lee.
At that time there was no standard for HTML.
• HTML 2.0: Released in 1995.
Began to move to a standard. Released at the
same time were MS IE 2.0 and Netscape’s
Navigator 2.0.
Recall that the World Wide Web Consortium
(W3C) serves as a leader in maintaining Web
standards and common protocols. It was founded
in 1994.
29
History of HTML
• HTML 3.2: Introduced in 1997 by the W3C.
Supported tables, complex numbers, and text
flow around images.
• HTML 4.0: Released by W3C in 1997.
Included support for cascading style sheets,
and added international features such as the
ability to render text right to left.
• HTML 4.01: Released by W3C in 1999.
Supported more multimedia options,
scripting languages, and documents more
accessible to users with disabilities
30
History of HTML
• XHTML Basic: Released in December 2000 by
W3C, incorporating elements of XML into
HTML to allow development on a wider set
of devices such as TVs, PDAs, pagers, and
cellular phones.
• Coming soon from W3C: XHTML 1.0, which is
a reformulation of HTML 4.0 in XML.
31
SGML
• Work on the definition of a Generalized Markup
Language for describing electronic documents
and their format was begun in the 1960s.
• In 1986, the International Standards Organization
(ISO) adopted a version of the standard called
Standard Generalized Markup Language.
• SGML includes a standard that defines deviceindependent and machine-independent methods
for representing electronic documents.
32
Advantages of SGML
• SGML is good for organizations with special or
complex requirements for the management of
documents. Examples: U.S. DOD, HP
• It is stable since it was standardized in 1986.
• It is platform independent and will outlive most
current applications.
• It supports user-defined tags and architecture.
Why is SGML not used by everyone?
33
Disadvantages of SGML
• SGML’s tools are relatively expensive when
compared to HTML.
• SGML has a steep learning curve.
• It is costly to set up and maintain, requiring
extensive training and expertise.
• Creating document type definitions with SGML
can be expensive in terms of human labor.
34
XML
• Extensible Markup Language is also derived from
SGML, although it is newer than HTML.
• It represents an effort to define what information
is on a Web page. This contrasts with HTML
where the emphasis is on the format of the data.
• XML allows designers to easily describe and
deliver structured data from any application in
a standard, consistent way.
35
Idea behind XML
• XML is both a markup language and meta
markup language.
• XML allows you to create new tags for each
type of document you are storing.
• In this way, XML stores information in a
structured manner.
• It is also interoperable with both HTML and
SGML. This allows data stored in XML to
be displayed (using HTML) and integrated
with SGML documents.
36
XML example I
<article>
<title>Some XML</title>
<date>April 25, 2001</date>
<author>
<mname>Amber</mname>
<lname>Settle</lname>
</author>
<summary>Sample XML</summary>
<content>XML is not for displaying information
but for managing information.
</content>
</article>
37
XML example II
<list>
<employee><fname>Simone</fname>
<lname>Settle</lname>
<ssn>123-00-5454</ssn>
<salary>70000</salary>
<position>network administrator</position>
<hire-year>1999</hire-year>
</employee>
<employee><fname>Joon</fname>
<lname>Elam</lname>
<ssn>456-88-7654</ssn>
<salary>62000</salary>
<position>web designer</position>
<hire-year>2000</hire-year>
</employee>
</list>
38