Transcript Platforms
NITISH MANOCHA
Platforms
AIX workstation
OS/390
Sun Solaris
Windows NT
Tools to Use
Topic categorization tool
Categorizing emails
Categorizing Web Pages
Text Analysis Tool
Topic Categorization Tool
Text Analysis Tool
Topic Categorization Tool
Category 1 (AI Schedule)
Text Analysis Tool
Category2 (Database Schedule)
Text Analysis Tool
Target Category ( Data Mining Schedule)
Text Analysis Tool
Result - Category 2 (Databases)
Tools to Use
Clustering Tool (Finding Similar
Information)
Dividing Documents into Groups
Identifying hidden similarities in documents
Identifying duplicate documents from a
collection
Finding Documents that are out of place
Text Analysis Tool
Hierarchical Clustering - imzhclst
Text Analysis Tool
Binary Clustering - imzcrlst
Text Analysis Tool
Results
Text Analysis Tool
Results
Tools to Use
Feature Extraction Tool
Name Extraction
Abbreviation Extraction
Relation Extraction
Text Analysis Tool
Using Feature Extraction tool to extract names
imzxrun -b 2 -f C -x n -o faculty.out faculty.htm
Text Analysis Tool
Tools to Use
Language Identification Tool
Organize collection of documents by language
Restrict Search Results to documents in a
particular language
Text Analysis Tool
Using Language Identification tool
imzlgini -b 2 -v < mydoc.htm
Text Analysis Tool
Language Identification Tool Results
Supports 13 Languages, New Languages Can
be trained
Text Analysis Tool
Using Summarizer tool
imzsum -l 4 project.html
Text Analysis Tool
Summarizer tool - Results
Tools to Use
Web Crawler
Follows the Link topology for a fast search
Produces a Web Site Map
Use to Recognize the Authoritative pages
Provides a filtered collection of pages
Web Crawler
imyclean - to define a web space
Created include.re , exclude.re, types.re
imycrawl - to crawl a defined web space
imycrawl url webspace
imystat - to track what happens during a
crawl
Tools to Use
Text Search Engine
Complicated Text Search
Powerful Linguistic Capabilities
Fuzzy searches
Query based on structure of document
Text Search Engine
Operates on a Previously based index
Text Search Engine
Types of Index
Linguistic Index (bought as buy)
Feature Index (Linguistics + Names)
Precise Index (bought as bought)
Normalized Precise Index (Case Insensitive)
Ngram Index
Combining Tools for Solutions
Searching with Categories
combining Text Search Engine and Topic
Categorization Tool
Surviving a flood of email
by using Topic Categorization Tools
Selectively indexing Web Pages
by combining Web Crawler, Topic
Categorization Tool & Text Search Engine
Views of the Tool
Command Line (Good for Unix)
Not very useful on Windows NT
Not a good stand-alone Tool
Should be viewed as a Library