Accelerating discovering in Science

Download Report

Transcript Accelerating discovering in Science

Accelerating Discovery in
Science and Engineering
Fabrizio Gagliardi
Director – EMEA & LATAM
Technical Computing
Microsoft Corporation
Introduction
•
•
•
•
•
•
Some personal introductory remarks
Progress in grid computing
Microsoft progress in HPC
Microsoft technology for science
Engagements in science
Conclusions
Some personal introductory
remarks
•
•
•
•
•
•
I am again here: since 2001 I have not missed this event
a single time!
Happy to be associated with the pioneering work of
Poland in HPC, networking and Grid computing
Honoured to witness the present success
Good opportunity to review the progress of my activity
since last year
Last year I spoke about e-infrastrcuture, Grids and
Microsoft plans for Science
Let’s review the progress now
Progress in grid computing
•
•
•
•
•
Microsoft has sponsored GGF16 and GGF17 and took
the initiative of proposing a HPC profile within the OGSA
WG; a Data Management profile is also being discussed
On the application side we were prime sponsor at
HealthGrid in Valencia with a key note by David
Heckerman (AIDS vaccine research)
Rapid adoption from IT industry is essential for the future
of Grid technology : GGF and EGA have merged in the
Open Grid Forum (OGF) and held the first conference in
Washington early September this year
Industry is now represented in a board of directors: all
major vendors including Microsoft (Tony Hey)
Microsoft is also participating in the AdCom (myself) and
in some of the WGs (OGSA and Security)
Progress in grid computing 2
•
•
Major issues which still remain to bring grid computing
from academy to industry and commerce are:
• Security
• Interoperability
• Easy to integrate and use
• Reliability of the infrastructure
• Adequate new business models
Microsoft is now considering most of those issues in the
context of OGF
Microsoft progress in HPC
•
•
Windows Computer Cluster Software released
Microsoft HPC institutes successful experience around
the world
Microsoft Compute Cluster Server
What it does :
Head Node
Active
Directory
Job Mgmt Cluster Mgmt
SchedulingResource Mgmt
Desktop App
Policy,
reports
Job
User
Admin
Console
User Console
Cmd line
Management
Job
Input
DB/FS
Admin
Cmd line
Da
ta
Key advantages:
Node Manager
Job Execution User App
MPI
• Solution for High-Performance
Computing application at a
medium-low range of the scale
• Simplified administration and job
management
• Built-in job scheduler and MPI lib
• Four basic job scheduling policies
supported in V1
High speed, low latency
interconnect
• Fully integrated cluster solution
• Interoperability with Unix systems
• Leverages existing Windows
infrastructure and security
Institutes for High Performance Computing
TACC –
University of
Texas
Austin, TX USA
University of
Virginia
Charlottesville,
VA USA
University of Utah
Salt Lake City, UT
USA
Cornell Theory
Center
Ithaca, NY USA
University of
Tennessee
Knoxville, TN
USA
Southampton
University
Southampton, UK
Nizhni Novgorod
University
Nizhni Novgorod,
Russia
Tokyo Institute of
Technology
Tokyo, Japan
HLRS – University
of Stuttgart
Stuttgart,
Germany
Shanghai Jiao
Tong University
Shanghai, PRC
HPC Market Trends
Top 500 Supercomputer Trends
Industry
usage
rising
GigE is
gaining
Clusters
over 50%
x86 is
leading
Supercomputing Goes Personal
1991
1998
2005
System
Cray Y-MP C916
Sun HPC10000
Small Form Factor PCs
Architecture
16 x Vector
4GB, Bus
24 x 333MHz UltraSPARCII, 24GB, SBus
4 x 2.2GHz Athlon64
4GB, GigE
OS
GFlops
Top500 #
UNICOS
Solaris 2.5.1
Windows Server 2003 SP1
~10
~10
~10
1
500
N/A
Price
$40,000,000
$1,000,000 (40x drop)
< $4,000 (250x drop)
Customers
Applications
Government Labs
Large Enterprises
Every Engineer & Scientist
Classified, Climate,
Physics Research
Manufacturing, Energy,
Finance, Telecom
Bioinformatics, Materials
Sciences, Digital Media
Technology challenges
Moore’s law continues but power
consumption and heat dissipation are
reaching their limits
Memory and data access gap widen
Applications become more data intensive
The Future: Supercomputing on a Chip
IBM Cell processor
256 Gflops today
4 node personal cluster => 1 Tflops
32 node personal cluster => Top100
MS Xbox
3 custom PowerPCs + ATI graphics processor
1 Tflops today
$300
8 node personal cluster => “Top100” for $2500 (ignoring all that
you don’t get for $300)
Intel many-core chips
“100’s of cores on a chip in 2015” (Justin Rattner, Intel:
http://www.hpcwire.com/hpc/629783.html )
“4 cores”/Tflop => 25 Tflops/chip
The Microsoft project in Barcelona
Microsoft is interested in helping computer
scientists to develop new computing
architectures with a high level of parallelism
Mateo Valero and his BSC centre in Barcelona
are leaders in this field in Europe
Microsoft will collaborate with BSC to research
and develop an entirely new parallel computing
ecosystem
http://www.hpcwire.com/hpc/633342.html
Microsoft Technical Computing:
Radical Computing
Research in potential breakthrough
technologies
Advanced Computing for Science and
Engineering
Application of new algorithms, tools and
technologies to scientific and engineering
problems
High Performance Computing and tools
Application of high performance clusters and
database technologies to industrial applications
Application of existing and new tools for science
Can “Here and Now” technologies
accelerate discovery?
Can “Business” Tools and techniques
for dealing with
be used in scientific research to allow
researchers to be scientists and not
computer scientists…
Computational
Modeling
Persistent
Distributed
Data
Workflow,
Data Mining
& Algorithms
Interpretation
& Insight
Real-world
Data
Computational
Modeling
Persistent
Distributed
Data
Workflow,
Data Mining
& Algorithms
Interpretation
& Insight
Real-world
Data
The Problem for the e-Scientist
Experiments &
Instruments
Other Archives
Literature
questions
facts
facts
?
answers
Simulations
Data ingest

Data Query and Visualization
tools
Managing a petabyte

Support/training
Common schema

Performance
How to organize it?

Execute queries in a minute
How to reorganize it?

Batch (big) query scheduling
How to coexist & cooperate with others?
Persistent
Distributed
Storage
Visual
Programming
Distributed
Computation
Interoperability
& Legacy
Support via
Web Services
Searching &
Visualization
Live
Documents
Reputation
& Influence
Faster Time to Insight
Better integration to existing Windows infrastructure
Integrated and familiar development environment
Research
Integrate


Data acquisition from
source systems and
integration
Data transformation
and synthesis
Analyze


Data enrichment,
with business logic,
hierarchical views
Data discovery via
data mining
Report


Data presentation
and distribution
Data access for
the masses
Comparison of soil moisture
Water Content at 5 cm
0.6
0.5
Water Content at 20 cm
0.4
Vaira
0.6
0.3
0.5
0.2
0.1
y = 0.4712x
R2 = 0.7039
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Vaira
0.4
0.3
0.2
Tonzi
y = 0.5854x
0.1
Thanks to
Gretchen Miller – UC Berkeley
& Catharine Van Ingen (MSR)
R2 = 0.9163
0.0
0.0
0.1
0.2
0.3
Tonzi
0.4
0.5
0.6
SharePoint Products and Technologies
Microsoft Office SharePoint Server 2007
Server-based Excel
spreadsheets and data
visualization, Report
Center, BI Web Parts,
KPIs/Dashboards
Docs/tasks/calendars, blogs,
wikis, e-mail integration,
project management “lite”,
Outlook integration,
offline docs/lists
Business
Intelligence
Rich and Web
forms based frontends, LOB
actions, enterprise
SSO
Business
Forms
Platform
Services
Workspaces, Mgmt,
Security, Storage,
Topology, Site Model
Content
Management
Integrated document
management, records
management, and Web
content management with
policies and workflow
Collaboration
Portal
Enterprise Portal
template, Site
Directory, My
Sites, social
networking,
privacy control
Search
Enterprise scalability,
contextual relevance, rich
people and business
data search
Excel Services
Overview
Browser
Excel 2007
Design and
author
Publish
Spreadsheets
High quality web rending
Zero-footprint
Interactive: Set parameters,
sort, filter, explore
Limit to browser access
View and
Interact
Export/Snapshot into Excel
Programmatic Access
SharePoint platform and Excel services
Spreadsheets stored in
document libraries
Spreadsheet calculation and rendering
External data retrieval and caching
100% calculation fidelity
Excel 2007
Open in Excel for rich
exploration and analysis
Open snapshots
Custom
applications
Set values, perform calculations, get
updated values via web services
Retrieve full workbook file
Development:
.NET & Visual Studio
F#
Iron Python
Data:
SQL Sever
SQL Server analysis
Services
Workflow:
Collaboration:
Publications:
Windows Workflow
SharePoint Server 2007
Instant Messenger
ConferenceXP
Academic Live, Onfolio…
Questions to our scientist
colleagues?
Can these tools/technologies provide
value/insight to scientists?
What’s missing?
Ie. on HPC, analysis, etc?
How best to test/integrate these
technologies?
How to communicate these ideas?
Conferences, Workshops, Website?
Sharecode, Samples
Conclusions
Industry is moving HPC to commodity
Microsoft is world leader in commodity
computing and will play a major role in scientific
and technical computing solutions
Key figures in scientific computing such as
Burton Smith, Tony Hey have recently joined the
company in senior strategic positions
We are interested in getting your opinion and
collaborating with you to develop the most
productive computing environment for science
Thanks again for the invitation and see you next
year!!!
More info:
Windows HP
www.windowshpc.net
Data mining
www.sqlserverdatamining.com/
Develop without Borders Challenge
www.developwithoutborders.com
Technical Computing Blog
http://blogs.msdn.com/eScience