Transcript Platforms

NITISH MANOCHA
Platforms




AIX workstation
OS/390
Sun Solaris
Windows NT
Tools to Use
 Topic categorization tool


Categorizing emails
Categorizing Web Pages
Text Analysis Tool
 Topic Categorization Tool
Text Analysis Tool
 Topic Categorization Tool

Category 1 (AI Schedule)
Text Analysis Tool

Category2 (Database Schedule)
Text Analysis Tool
 Target Category ( Data Mining Schedule)
Text Analysis Tool
 Result - Category 2 (Databases)
Tools to Use
 Clustering Tool (Finding Similar
Information)




Dividing Documents into Groups
Identifying hidden similarities in documents
Identifying duplicate documents from a
collection
Finding Documents that are out of place
Text Analysis Tool
 Hierarchical Clustering - imzhclst
Text Analysis Tool
 Binary Clustering - imzcrlst
Text Analysis Tool
 Results
Text Analysis Tool
 Results
Tools to Use
 Feature Extraction Tool



Name Extraction
Abbreviation Extraction
Relation Extraction
Text Analysis Tool
 Using Feature Extraction tool to extract names

imzxrun -b 2 -f C -x n -o faculty.out faculty.htm
Text Analysis Tool
Tools to Use
 Language Identification Tool


Organize collection of documents by language
Restrict Search Results to documents in a
particular language
Text Analysis Tool
 Using Language Identification tool

imzlgini -b 2 -v < mydoc.htm
Text Analysis Tool
 Language Identification Tool Results

Supports 13 Languages, New Languages Can
be trained
Text Analysis Tool
 Using Summarizer tool

imzsum -l 4 project.html
Text Analysis Tool
 Summarizer tool - Results
Tools to Use
 Web Crawler




Follows the Link topology for a fast search
Produces a Web Site Map
Use to Recognize the Authoritative pages
Provides a filtered collection of pages
Web Crawler
 imyclean - to define a web space

Created include.re , exclude.re, types.re
 imycrawl - to crawl a defined web space

imycrawl url webspace
 imystat - to track what happens during a
crawl
Tools to Use
 Text Search Engine




Complicated Text Search
Powerful Linguistic Capabilities
Fuzzy searches
Query based on structure of document
Text Search Engine
 Operates on a Previously based index
Text Search Engine
 Types of Index





Linguistic Index (bought as buy)
Feature Index (Linguistics + Names)
Precise Index (bought as bought)
Normalized Precise Index (Case Insensitive)
Ngram Index
Combining Tools for Solutions
 Searching with Categories

combining Text Search Engine and Topic
Categorization Tool
 Surviving a flood of email

by using Topic Categorization Tools
 Selectively indexing Web Pages

by combining Web Crawler, Topic
Categorization Tool & Text Search Engine
Views of the Tool




Command Line (Good for Unix)
Not very useful on Windows NT
Not a good stand-alone Tool
Should be viewed as a Library