Coach - Computer Science and Engineering

Download Report

Transcript Coach - Computer Science and Engineering

Preparing for the Poster Session
Gagan Agrawal
Outline




Background on the proposal
Overall research focus
Equipment requested
Preparing for the Site Visit
Background



A proposal submitted to the National Science
Foundations (NSF) CISE Research Infrastructure
program
The program targets research equipment for multiinvestigator teams doing experimental computer
science - typically fund 4-5 US universities each
year
After initial review of proposals, a set of universities
receive a site visit. Final selection based upon the
site visit
History on the Proposal






Proposal involving 14 faculty / senior researchers across CIS,
BMI, and OSC (Principal Investigators: Panda, Agrawal,
Sadayappan, Shen, Saltz)
Proposal submitted in October 2002 (a 105 page document in
all !)
Total request to NSF: $1,350,000 (+ matching from state of
Ohio and OSU)
All funds for equipment and one full time support person to
manage the equipment
Rated as one of the top three proposals among 22 submissions
this year
8 universities are getting site visit, 4-6 to be funded
Site Visit Schedule


Scheduled for 10th March, will involve two NSF
program managers and 2 experts from other
universities
Agenda:





Presentations about the department, our research,
requested equipment
Discussion about our education programs, diversity, etc.
Meeting with Dean and Vice Provost for research
Tour of facilities and demos
A student poster session
Motivation / Goals for Poster Session




Graduate education is a key mission of NSF – they
want to fund where it will make a difference on
graduate education
Opportunity to show research beyond talks from PIs
A further opportunity to demonstrate a vibrant group
of experimental computer science researchers
A further opportunity to stress our need for
equipment
Why Should You Care



New equipment should help your research
Having an award like this will give more visibility to
our group / department (will help you when you look
for a job)
A good opportunity to present your work


Posters can be reused for open houses, etc.
Your advisor will be unhappy if you don’t do a good
job 
Rest of this talk

A big research picture that was put in the proposal




Required to show an overall vision / synergy among the
investigators
Some details of the equipment and configuration
requested
Things to bring-out in your poster
Some kind of questions you should be prepared for
Overall Research Focus

Science and high performance computing are becoming datadriven


Clusters are a cost-effective way for




well recognized, for example in the cyberinfrastructure report
storing large datasets (i.e. serve as data repositories)
compute-intensive processing of data.
SMPs are also popular architectures for compute-intensive tasks
Processing of data may not always be feasible or desirable
where data is hosted


data repositories may be shared resources
may not be the best configuration for compute-intensive tasks
Grid and Cluster Computing Context


Separating processing of data from the cluster hosting the data
will be the norm in a wide-area (grid) environment
However, it may also be done within an organization



many users accessing the data
different configuration may be better for compute-intensive tasks
Support for hosting data at a cluster, and processing the data
at another cluster or an SMP machine is critically required


a challenging problem
Our overall focus
Research Challenges






Better intra-cluster communication and I/O support for data
intensive and interactive applications, and for allowing shared
access to data repositories
Need scheduling and resource sharing policies for such an
environment
Need high-level programming support to use such an
environment (middleware, compilers)
Algorithms from data intensive application areas (data mining,
viz.) need to be modified or tuned for such an environment
Need to work with real applications and real datasets to drive
the work
Many existing individual projects in these directions, but a
common infrastructure will help integrate and evaluate the work
The Equipment we are asking for






Storage cluster - 24 nodes, 80 TB of storage, located at BMI
Compute cluster – 32 nodes, various interconnects (myrinet,
quadrics, infiniband) located at CIS
SMP machine - approx. 16 CPU machine, located at CIS
Visualization equipment (graphics cards, haptic devices)
High-speed networking (1.0 Gb) between CIS and BMI, CIS and
OSC, and BMI and OSC
Storage and compute clusters will be upgraded during the 4th
year of the grant - inter-site networking up to 10 Gb
Overall Configuration
Configuration Within CIS
Visualization
Myrinet
Server
GigaNet
Compute
Servers
Myrinet (Lanai 7)
Gigabit Ether (8)
Myrinet (Lanai 3)
Myrinet (Lanai 9)
InfiniBand (4)
Quadrics (4)
16-Quad
Pentium
700 MHz
16-Dual
Pentium
300 MHz
8-Dual
Pentium
2.4 GHz
Video
Wall
16-Dual
Pentium
1.0 GHz
Data
Server
9-Dual Pentium
1.0 GHz +
Terabytes of
storage
Myrinet (Lanai 9)
Gigabit
Ethernet
Ohio Supercomputer Center
(Production Clusters + Storage Cluster)
Rationale





Need to experiment with applications on a distributed collection
of compute, storage, and visualization resources
We want to study architectures for storage clusters and
compute clusters, and therefore, want crashable resources
Need to work with data-intensive applications with very large
datasets, need sufficient storage for those
We want to evaluate system software in a distributed and
heterogenous environment, but need a set up that will allow
repeatable experiments
Research will focus on networked clusters (and SMP machines)
but is extendable to a more wide area environment through
links to OSC, OSC machines, and links from OSC to elsewhere
Proposed Research


Overall theme: an integrated approach – support at
low-level, incorporated into appropriate programming
systems, driven or enhanced by research at
algorithms level, and tested by end applications
Four components:




Communication and I/O (Panda, Lauria, Wyckoff )
Middleware and Programming Systems (Saltz, Kurc,
Catalyurek, Agrawal, Saday)
Data Intensive algorithms (or application areas) – Srini,
Hakan, Agrawal, Han-Wei, Raghu, Stredney (?)
End applications: Saltz et al, Stredney, Raghu, Saday, Hanwei (?)
Area 1: Communication and I/O

Need to enhance communication and I/O
mechanisms



Both at the intra-cluster and inter-cluster level
Specific needs for data-intensive and interactive applications
Components:



Support for point-point and collective communication, and
synchronization – incorporated at the MPI, DSM layers
(Panda)
Support for intra and inter cluster QoS (Panda)
Support for efficient and parallel I/O at intra and intercluster level (Lauria)
Area 2: Middleware and Programming
Systems


Goal: High-level programming systems and policies
are required to utilize multiple clusters and SMP
machines
Components:







Datacutter (Saltz, Kurc, Catalyurek)
Compiler support on top of Datacutter (Agrawal et al.)
Scheduling task graphs (Saday et al.)
Scheduling across multiple tasks (Saday)
Multiple Query Optimization (Saltz et al.)
Middleware for Datamining (Agrawal)
Indexing and declustering for data repositories (Hakan)
Area 3: Data Intensive Algorithms

Need to develop and/or fine-tune and/or evaluate
algorithms and techniques in the areas of





data mining
scientific data analysis, and
visualization
in our proposed environment and on top of the
programming systems developed
Components:



Parallel data mining algorithms, particularly shared memory
(Srini, Agrawal)
Scientific data analysis (Machiraju, Srini)
Visualization and imaging etc. (Han-Wei, Raghu)
Area 4: End Data Intensive Applications

We are working with end data-intensive, data-driven,
interactive, and/or collaborative applications to




evaluate our work at the communication and I/O, programming
systems, and algorithm levels
to obtain large datasets
to demonstrate that our research can benefit end real applications
Components:





Time-varying scientific data visualization (Han-Wei)
Oil reservoir simulation (Saltz)
Medical applications (Saltz, Shen, Stredney, Machiraju)
Scientific (chemistry) application (Saday)
3-d human scan analysis (Machiraju)
Things to bring out in your posters

Interesting experimental computer science research





Preferably some preliminary experimental results



Involving system software,
Large datasets,
Careful performance analysis on dedicated systems, or
Involving a distributed environment
Show we can do quality experimental research
Demonstrate need for more equipment, if
appropriate (part of future work ?)
Mention existing or potential collaborations, if
appropriate
Some Questions to be Prepared for



What equipment you have used so far ?
Do you feel need for any additional equipment ?
For systems posters: what benchmarks/applications
you might be using in the future


See if any of existing work in the areas of visualization,
data mining, end applications may be appropriate
For algorithm / application posters: what system
support you could use for scaling your work, or going
to distributed environments

See if any of the work on QoS, DataCutter, FREERIDE,
Scheduling may be relevant
Some Logistics






A rehearsal session on 28th Feb, 3:30 – 4:30, DL 480
Final site-visit on 10th March, poster session 1:30 – 2:30 - set
up from 11:30 onwards, plan to be available till 3:30 - room
TBA
Poster size – 30 inch width, 36 inch height – can have 9-12
slides
Can use department poster printer (ask your advisor) – don’t
use it for rehearsal
Be professional during the site visit – no unnecessary talking
among yourself, no use of Hindi / Chinese / …
Dress code - ?