Microsoft Research

Download Report

Transcript Microsoft Research

Microsoft Research
Jim Gray
Researcher
Microsoft Research
Microsoft Corporation
Outline


Highlights of MS
Research
DB-Related Research
in some detail
Microsoft Research
http://Research.Microsoft.com/




Goal: pursue strategic technologies for MS
231 Researchers in 20 areas
Plan to grow to 600 in 3 years
Internationally recognized research teams



Hundreds of publications, conference presentations
Leadership in professional societies, journals,
conferences
Virtually all MS products use technology from
MS Research
Broad Research Areas

Systems & architecture


Make programmers more productive


Tools, compilers
Human-oriented interaction



Distributed & scaleable systems,
databases, cryptography, telepresence
Speech, natural language, vision, intelligence
3D Graphics, moving beyond GUI
Fundamental problems
Development Tools

Analysis of executables



Automatic reorganization of executables





Dynamic analysis driven by user scenarios
Instrumented code
Reduce code working set size
Branch straightening
Boot ordering for boot time reduction
Reduced code working sets up to 50%
Improved throughput by 10%
Smaller Working Sets
Faster Startup & Execution
Pages Referenced
600
50%
Original
Optimized
500
39%
40%
35%
33%
400
30%
300
20%
200
7% 8%
10%
100
10%
2%
0%
0
0
50
Seconds
100
TxRd$NoL
TxRW$NoL
TxWrt$
NT Network Drivers
Throughput Speedup
Speech Technology

Speech recognition



Trainable speech synthesis


Speaker independent, Command and Control
Dictation
 Speaker independent, large vocabulary
 Discrete and continuous speech
Prosody and concatenative speech units
learned from corpus
Demos and SDKs at web site
http://research.microsoft.com/research/srg/
Natural Language Understanding




Broad-coverage syntax analyzer.
Dictionary-based semantic network
provides growing knowledge base.
Underlying system is multi-lingual.
Contributions to MS word-processing
Levels of writing critiques
We scheduled teh next meeting for noon.
Each of the products are designed to help.
I saw the Grand Canyon flying to Arizona.
Ladies are requested not to have children
in the bar. (from a sign in a Norwegian
cocktail lounge)
Natural Language
Roadmap and Mile Stones
Robotics
Vision
Machine
learning
University
Discourse/
pragmatics
Library
UI
2nd F
Dictionary
SR
Probs
(DT)
fjeiofjdksl fjeiofjdksl
eriowe.asm eriowe.asm
qweiqpo ero qweiqpo ero
oei iqpwe iio oei iqpwe iio
qwpe ec,l;aklqwpe ec,l;akl
Concept
normalizing
Sense
choosing
Logical form
Revised syntax
Initial syntax
Morphology
Interactive movies
Info highway cruiser
Advanced summary
Interactive games
Peedy+
NL query
Improved IR
Bob+
Enhanced help
Improved SR
Semantic critiques
Auto indexing
Syntactic
critiques
Phrase spacing
Find and replace
3D Graphics Research

Bring very high performance,
high quality graphics to PCs



Modeling



Interactive
Uniform treatment of multimedia
Representation of 3D models
Automatic simplification
Animation
Simplification Problem
70,000
8,700
34,100
4,200
2,600
2,300
Competing goals: accuracy & conciseness
Simplification Results
(creases)
34,100
difficult to author directly!
2,300
original
2,300
simplified
(~10 minutes)
Vision Projects
 3D
Reconstruction from Video and Images
 Motion
analysis for video compression
 Model acquisition for rendering
 Visual
Human/Computer Interaction
 Communication
by gestures and expressions
 Multimodal speech/vision interfaces
Motion Analysis
 convert
masked images into a background
sprite for content-based coding [Scrunch]
+
+
=
 working
on motion tracking
+
Outline


Highlights of MS
Research
DB-Related Research
University of Washington
& Microsoft Research
Summer Institute 1997
Data Mining & Knowledge
Discovery
July 6-11, 1997
A collection of 75
researchers from around
the world: –Statistics
–Databases
–AI/pattern recognition
–Visualization
–Systems
–HPC
http://research.Microsoft.com/uwmsrdmi
Data Mining and OLAP

The Query formulation problem:
difficult to formulate as DB queries




which records are fraudulent transactions?
which households are likely to prefer a Ford
over a Toyota?
Who’s a good credit risk in my customer DB?
Yet database contains the information


good/bad customer, profitability, response rate
known objects of interest (stars/galaxies),
results in medical studies with higher details,
etc...
DM helps



Use data to build predictors
 regression, classification, clustering etc.
Generate summaries to aid understanding
 find “easy to describe” partitions in data
automatically (via clustering/segmentation)
 find clusters not well-known to analyst
Example:
Analyze web server hits to categorize clients.
A new mode of interacting with database
Data Base Research

http://research.Microsoft.com/db/
Databases



Recovery
 Recover application state after sys fail.
Data Structures
 High concurrency data structures
 Key range locking in SS 7 based on Lomet’s
work.
Auto DB Design & Admin
 Index DB designer started as research
project.
 More coming
Scaleable Servers

High Availability SQL Server



Billion Transactions Per Day demo




Virtual Server design
Always Up prototype
Combined SQL 6.5 with MTS, DTC
RAGS: stress test NT DBs
TerraServer: a giant single node SQL7 DB
Always Up prototype
Scaleable Servers
BOTH SMP and Cluster
Grow Up with SMP
4xP6 is now standard
SMP
Super Server
Grow Out with Cluster
Cluster has inexpensive parts
Departmental
Server
Personal
System
Cluster
of PCs
Scale OUT
Clusters Have Advantages

Fault tolerance:


Modular growth without limits


Grow by adding small modules
Parallel data search


Spare modules mask failures
Use multiple processors and disks
Clients and servers made from the same stuff

Inexpensive: built with commodity CyberBricks
Billion Transactions per Day Project

Built a 45-node Windows NT Cluster
(with help from Intel & Compaq)






> 900 disks
All off-the-shelf parts
Using SQL Server &
DTC distributed transactions
DCOM & ODBC clients
on 20 front-end nodes
DebitCredit Transaction
Each server node has 1/20 th of the DB
Each server node does 1/20 th of the work
15% of the transactions are “distributed”
1 B tpd: So What?

Shows what is possible, easy to build

Grows without limits

Shows scaleup of DTC, MTS, SQL…
Shows (again) that shared-nothing clusters
scale

Next task: make it easy.




auto partition data
auto partition application
auto manage & operate
Parallelism
The OTHER aspect of clusters

Clusters of machines
allow two kinds
of parallelism



Many little jobs: online
transaction processing
 TPC-A, B, C…
A few big jobs: data
search and analysis
 TPC-D, DSS, OLAP
Both give
automatic parallelism
Kinds of Parallel Execution
Pipeline
Partition
outputs split N ways
inputs merge M ways
Any
Sequential
Program
Any
Sequential
Program
Any
Sequential
Program
Any
Sequential
Program
Data Rivers
Split + Merge Streams
N X M Data Streams
M Consumers
N producers
River
Producers add records to the river,
Consumers consume records from the river
Purely sequential programming.
River does flow control and buffering
does partition and merge of data records
River = Exchange operator in SQL 7
N x M way Parallelism
Merge
Merge
Merge
Sort
Sort
Sort
Sort
Sort
Join
Join
Join
Join
Join
A...E
F...J
K...N
O...S
T...Z
N inputs, M outputs, no bottlenecks.
Partitioned Data
Partitioned and Pipelined Data Flows
Clustered DBs

Pieces are in place with SQL 7.0






Manage arrays of SQL Servers
Partition data among files
Disk and CPU Parallel query plans
Distributed query via OLE DB
High Availability with MSCS failover
Next Parallel Databases on NT clusters




Parallel Query
Transparent access to partitioned tables
Grow without limits (16 x 2 TB)
Do not add complexity

Design, Administer, Operate, Use
Scaleup - Big Database

Build a 1 TB SQL Server database



Data must be





1 TB
Unencumbered
Interesting to everyone everywhere
And not offensive to anyone anywhere
Loaded






Show off Windows NT and
SQL Server scalability
Stress test the product
1.1 M place names from Encarta World Atlas
1 M Sq Km from USGS (1 meter resolution)
2 M Sq Km from Russian Space agency (2 m)
Will be on web (world’s largest atlas)
Sell images with commerce server.
USGS CRDA: 3 TB more coming.
TerraServer
World’s Largest PC!

324 disks (2.9 terabytes)

NT EE & SQL 7.0

8 x 440Mhz Alpha CPUs


10 GB DRAM
Photo of the planet
USGS and Russian
images
Background

Earth is 500 Tera-meters square



100 TM2 land in 70ºN to 70ºS
We have pictures of 6% of it






USA is 10 tm2
3 tsm from USGS
2 tsm from Russian Space Agency
Compress 5:1 (JPEG) to 1.5 TB.
Slice into 10 KB chunks
Store chunks in DB
Navigate with
 Encarta™ Atlas
 globe
 gazetteer
 StreetsPlus™ in the USA

Someday



multi-spectral
image
of everywhere
once a day / hour
1.8x1.2 km2 tile
10x15 km2 thumbnail
20x30 km2 browse image
40x60 km2 jump image
USGS Digital Ortho Quads




(DOQ)
US Geologic Survey
3 TeraBytes
Most data not yet published
Based on a CRADA
TerraServer makes data available.
1x1 meter
4 TB
Continental
US
New Data
Coming

USGS “DOQ”
Russian Space Agency(SovInfomSputnik)
SPIN-2 (Aerial Images is Worldwide Distributor)






1.5 Meter Geo Rectified imagery of (almost) anywhere
Almost equal-area projection
De-classified satellite photos (from 200 KM),
More data coming (1 m)
Want to sell imagery on Internet.
Putting 2 tm2 onto TerraServer.
SPIN-2
Demo
http://www.TerraServer.com
Microsoft
BackOffice
SPIN-2
Hardware
1TB Database Server
AlphaServer 8400 4x400. 10 GB RAM
324 StorageWorks disks
10 drive tape library (STC Timber Wolf DLT7000 )
SPIN-2
Software
Web Client
Image
Server
Active Server Pages
Internet
Information
Server 4.0
Java
Viewer
broswer
MTS
Terra-Server
Stored Procedures
HTML
The Internet
Internet Info
Server 4.0
Sphinx
(SQL Server)
Microsoft Automap
ActiveX Server
Terra-Server DB
Automap Server
Terra-Server Web Site
Internet Information
Server 4.0
Microsoft
Site Server EE
Image Delivery SQL Server
Application
7
Image Provider Site(s)
System
Management &
Maintenance

Backup and Recovery





STC 9717 Tape robot
Legato NetWorker™
Sphinx Backup/Restore Utility
Clocked at 80 MBps!!
SQL Server Enterprise Mgr


DBA Maintenance
SQL Performance Monitor
TerraServer File Group Layout

Convert 324 disks to 28 RAID5 sets
plus 28 spare drives

Make 4 NT volumes (RAID 50)
595 GB per volume


Build 30 20GB files on each volume
DB is File Group of 120 files
E:
F:
G:
H:
Gazetteer Design


Classic Snowflake Schema
Fast First hint to Optimizer
PlaceGrid
Place
CountrySearch
AlternateName
CountryID
GazSrcID
1148
Country
CountryID
CountryName
UNcode
264
StateSerach
AlternateName
CountryID
StateID
FreatureID
GazSrcID
State
StateID
CountryID
StateName
1083
PlaceID
ImageFlag
AlternateName
Name
CountryID
StateID
TypeID
GazSourcID
Latitude
Longitude
UGridID
ZGridID
DOQdate
SPIN2date
3776
1,089,897
ZGridID
BestPlaceName
XDistance
YDistrance
50,000,000
FeatureType
TypeID
Description
13
GazetteerSource
GazSrcID
Description
1
Image Data Design
Image pyramid stored in DBMS (250 M recs)

OriginalMetaData
ImageMeta
OrigMetaID
SrcID
ImageSource
Agency
SourcePhotoID
SourcePhotoDate
SourceDEMDate
MetaDataDate
ProductionSystem
ProductionDate
DataFileSize
Compression
HeaderBytes
…
80 other fields
ImgMetaID
OrigMetaID
ImgStatus
ImgDate
ImgTypeID
JumpPixHeight
JumpPixWidth
BrowsePixHeight
BrowsePixWidth
ThumbPixWidth
ThumbPixHeight
CutCol
CutRow
MidLat
MidLong
NELat
NELong
NWLat
NWLong
SELat
SELong
SWLat
SWLong
UGridID
UTMZone
XUtmID
YUtmID
XGridID
YGridID
ZGridID
650 k SPIN2
2 M USGS
ImgSource
ImgType
ImgTypeID
ImgFileDesc
ImgFileExt
MimeStr
SrcID
SrcName
SrcTblName
SrcDescription
GridSysID
ImgTypeID
4
2
650 k SPIN2
2 M USGS
Pick
Log
UGridHits
Name
Description
Link
PickDate
URL
Time
<extensive
list of action
parameters
URL
UGridID
ZTileGridID
count
10
TileMeta
xxx
xxx
Jump
Browse
Thumb
Tile
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
UGridID
ZGridID
ZTileGridID
ImgData
1
ImgDate
ImgTypeID
ImgMetaID
SrcID
EncryptKey
File Name
.65 M SPIN2
1.5 M USGS
.65 M SPIN2
1.5 M USGS
.65 M SPIN2
1.5 M USGS
16 M SPIN2
96 M USGS
ImgMetaID
OrigMetaID
SrcID
ImgStatus
ImgDate
ImgTypeID
TilePixHeight
TilePixWidth
CutCol
CutRow
MidLat
MidLong
NELat
NELong
NWLat
NWLong
SELat
SELong
SWLat
SWLong
UGridID
UTMZone
XUtmID
YUtmID
XGridID
YGridID
ZGridID
16 M SPIN2
96 M USGS
Image Delivery and Load
DLT
Tape
DLT
Tape
“tar”
NT
DoJob
\Drop’N’
LoadMgr
DB
Wait 4
Load
Backup
LoadMgr
LoadMgr
ESA
Alpha
Server
4100
100mbit
EtherSwitch
60
4.3 GB
Drives
Alpha
Server
4100
ImgCutter
\Drop’N’
\Images
Enterprise Storage Array
STC
DLT
Tape
Library
108
9.1 GB
Drives
108
9.1 GB
Drives
108
9.1 GB
Drives
Alpha
Server
8400
10: ImgCutter
20: Partition
30: ThumbImg
40: BrowseImg
45: JumpImg
50: TileImg
55: Meta Data
60: Tile Meta
70: Img Meta
80: Update Place
...
Terra-Server Tables

USGS DOQ Data




SPIN-2 Data




3200 278 MB images are input (approximate size)
Creates 620,800 Jump, Thumb, & Browse images (2.5 m rows)
Creates 15.5 m Tile images (31 m rows)
Gazetteer Data



48,000 DOQ images are input (45-55mb / image)
Creates 864,000 Jump, Thumb, & Browse images (3.5 m rows)
Creates 55.3 m Tile images (110.6 m rows)
1.1 m named places (Encarta World Atlas)
45 m cell names
Total Rows = 228 M today (and growing fast)
Other Details

Active Server pages


faster and easier than DB stored procedures.
Commerce Server is interesting

Images the Inventory



USGS built their own






no SKU,
millions of them
they are very smart, but it is easy
masquerade as a credit-card reader.
The earth is a geoid, and
Every Geographer has a coordinate system
Tapes are still a nightmare.
Everyone is a UI expert.
(or two).
SQL 7 Testimonial

We started using it March 4 1997





Loaded the DB twice





SQL 7 Pre-Alpha
SQL 7 Alpha
SLQ 7 Beta 1
SQL 7 Beta
(we made application mistakes)
Now doing it “right”
Reliability: Great! SQL 7 never lost data
Ease of use: Great!
Functionality: Great!
RAGS:
RAndom SQL test Generator


Microsoft spends a LOT of money on testing.
Idea: test SQL by




generating random correct queries
executing queries against database
compare results with SQL 6.5, DB2, Oracle, Sybase
Being used in SQL 7.0 testing.

Very productive test tool
Sample
Rags
Generated
Statement
SELECT TOP 3 T1.royalty , T0.price , "Apr 15 1996 10:23AM" , T0.notes
FROM titles T0, roysched T1
WHERE EXISTS (
SELECT DISTINCT TOP 9 $3.11 , "Apr 15 1996 10:23AM" , T0.advance , (
"<v3``VF;" +(( UPPER(((T2.ord_num +"22\}0G3" )+T2.ord_num ))+("{1FL6t15m" +
RTRIM( UPPER((T1.title_id +((("MlV=Cf1kA" +"GS?" )+T2.payterms )+T2.payterms
))))))+(T2.ord_num +RTRIM((LTRIM((T2.title_id +T2.stor_id ))+"2" ))))),
T0.advance , (((-(T2.qty ))/(1.0 ))+(((-(-(-1 )))+( DEGREES(T2.qty )))-(-((
-4 )-(-(T2.qty ))))))+(-(-1 ))
FROM sales T2
WHERE EXISTS (
SELECT "fQDs" , T2.ord_date , AVG ((-(7 ))/(1 )), MAX (DISTINCT
-1 ), LTRIM("0I=L601]H" ), ("jQ\" +((( MAX(T3.phone )+ MAX((RTRIM( UPPER(
T5.stor_name ))+((("<" +"9n0yN" )+ UPPER("c" ))+T3.zip ))))+T2.payterms )+
MAX("\?" )))
FROM authors T3, roysched T4, stores T5
WHERE EXISTS (
SELECT DISTINCT TOP 5 LTRIM(T6.state )
FROM stores T6
WHERE ( (-(-(5 )))>= T4.royalty ) AND (( (
( LOWER( UPPER((("9W8W>kOa" +
T6.stor_address )+"{P~" ))))!= ANY (
SELECT TOP 2 LOWER(( UPPER("B9{WIX" )+"J" ))
FROM roysched T7
WHERE ( EXISTS (
This statement yields an error (now fixed)
SELECT (T8.city +(T9.pub_id +((">" +T10.country )+
UPPER( LOWER(T10.city))))), T7.lorange ,
((T7.lorange )*((T7.lorange )%(-2 )))/((-5 )-(-2.0 ))
FROM publishers T8, pub_info T9, publishers T10
WHERE ( (-10 )<= POWER((T7.royalty )/(T7.lorange ),1))
AND (-1.0 BETWEEN (-9.0 ) AND (POWER(-9.0 ,0.0)) )
)
--EOQ
) AND (NOT (EXISTS (
SELECT MIN (T9.i3 )
FROM roysched T8, d2 T9, stores T10
WHERE ( (T10.city + LOWER(T10.stor_id )) BETWEEN (("QNu@WI" +T10.stor_id
)) AND ("DT" ) ) AND ("R|J|" BETWEEN ( LOWER(T10.zip )) AND (LTRIM(
UPPER(LTRIM( LOWER(("_\tk`d" +T8.title_id )))))) )
GROUP BY T9.i3, T8.royalty, T9.i3
HAVING -1.0 BETWEEN (SUM (-( SIGN(-(T8.royalty ))))) AND (COUNT(*))
)
--EOQ
) )
)
--EOQ
) AND (((("i|Uv=" +T6.stor_id )+T6.state )+T6.city ) BETWEEN ((((T6.zip +(
UPPER(("ec4L}rP^<" +((LTRIM(T6.stor_name )+"fax<" )+("5adWhS" +T6.zip ))))
+T6.city ))+"" )+"?>_0:Wi" )) AND (T6.zip ) ) ) AND (T4.lorange BETWEEN (
3 ) AND (-(8 )) ) )
)
--EOQ
GROUP BY ( LOWER(((T3.address +T5.stor_address )+REVERSE((T5.stor_id +LTRIM(
T5.stor_address )))))+ LOWER((((";z^~tO5I" +"" )+("X3FN=" +(REVERSE((RTRIM(
LTRIM((("kwU" +"wyn_S@y" )+(REVERSE(( UPPER(LTRIM("u2C[" ))+T4.title_id ))+(
RTRIM(("s" +"1X" ))+ UPPER((REVERSE(T3.address )+T5.stor_name )))))))+
"6CRtdD" ))+"j?]=k" )))+T3.phone ))), T5.city, T5.stor_address
)
--EOQ
ORDER BY 1, 6, 5
)




Always UP: System pairs
Two clusters at two sites
Changes from one
sent to other
When one site fails
other provides service
Masks



Hardware/Software faults
Operations tasks (reorganize, upgrade
move
Environmental faults (power fail)
Outline


Highlights of MS
Research
DB-Related Research