NLP & Compilers topics

Download Report

Transcript NLP & Compilers topics

Research Topics
CSC 3990
Parallel Computing &
Compilers
CSC 3990
What is a Compiler?
• Compiler
– Converts source code into machine code
– Automatic
– Relieve programmer from having to know
about machine (processor)
What is a Parallel Compiler?
• Parallel Compiler
– Converts source code into machine code to
run on a parallel computer
– Centralized shared memory computer or
supercomputer
– Distributed computer
– Anything where a single program will run on
more than one processor
Compiler Structure
source code
compiler front-end
intermediate code
loop optimization
register allocation
code generation
code scheduling
machine code
Phases of a Compiler
Source program
Intermediate-code Generator
Lexical Analyzer (Scanner)
Non-optimized Intermediate Code
Tokens
Intermediate-code Optimizer
Syntax Analyzer (Parser)
Parse tree
Optimized Intermediate Code
Semantic Analyzer
Target-code Generator
Abstract Syntax Tree w/ Attributes
Target machine code
Nanocompiler – an initial vision
Source code
Front end
Machine requirements
analysis
Machine description
generation
Executable code
processor
Processor generator
Back end
Dynamic profiler
• Machine description
generated from IR
• Processor generated from
machine description
• Executable runs on
generated processor
• Dynamic profiler feeds
back to analyzer
• Processor reconfigured at
run-time
Example: Loop Unrolling
• Loops are popular places for identifying
“parallelism”
• Can separate iterations of the same loop
execute at the same time?
• If so, how can the code be modified…
automatically… to make that happen?
for (i=0; i<100; i++)
A[i] = B[i] * C[i];
Natural Language Processing
CSC 3990
What is NLP?
• Natural Language Processing (NLP)
– Computers use (analyze, understand,
generate) natural language
– A somewhat applied field
• Computational Linguistics (CL)
– Computational aspects of the human
language faculty
– More theoretical
Why Study NLP?
• Human language interesting & challenging
– NLP offers insights into language
• Language is the medium of the web
• Interdisciplinary: Ling, CS, psych, math
• Help in communication
– With computers (ASR, TTS)
– With other humans (MT)
• Ambitious yet practical
Goals of NLP
• Scientific Goal
– Identify the computational machinery
needed for an agent to exhibit various
forms of linguistic behavior
• Engineering Goal
– Design, implement, and test systems
that process natural languages for
practical applications
Applications
• speech processing: get flight information or book
a hotel over the phone
• information extraction: discover names of people
and events they participate in, from a document
• machine translation: translate a document from
one human language into another
• question answering: find answers to natural
language questions in a text collection or
database
• summarization: generate a short biography of
Noam Chomsky from one or more news articles
General Themes
•
•
•
•
Ambiguity of Language
Language as a formal system
Rule-based vs. Statistical Methods
The need for efficiency
Topic Ideas
1.Textual Analysis – readability
2.Plagiarism Detection – candidate selection
3.Intelligent Agents – machine interaction
Textual Analysis - Readability
• Text Input
• Analyze text & estimate “readability”
– Grade level of writing
– Consistency of writing
– Appropriateness for certain educ. level
• Output results
• Research question: How can computer
analyze text and measure readability?
• Opportunities for hands-out research
Plagiarism Detection
• Text Input
• Analyze text & locate “candidates”
– Find one or more passages that might be plagiarized
– Algorithm tries to do what a teacher does
– Search on Internet for candidate matches
• Output results
• Research question: What algorithms work like
humans when finding plagiarism?
• Experimental CS research
Intelligent Agents
•
•
•
•
Example: ELIZA
AIML: Artificial Intelligence Modeling Lang.
Human types something
Computer parses, “understands”, and generates
response
• Response is viewed by human
• Research question: How can computers
“understand” and “generate” human writing?
• Also good area for experimentation