Microsoft Office97 and Unicode
Download
Report
Transcript Microsoft Office97 and Unicode
Chris Pratley
Group Program Manager
Microsoft Word
Overview
Office Unicode history and strategy
Implementation
Benefits of Unicode to Office users
Demo of Word
Office97 Unicode Strategy
Office97 driving factors
– Customers operate world-wide (US only 40%)
– Need to handle multiple code pages in Europe
Office97 goals
– Enable loss-less file exchange world-wide
– Solve code page problems in Europe
– Development efficiency for Asian and Euro versions
• Unified source code base – but still different executables
• Unified development process
• Delta between language versions shrinks from 18 to 2 months
– Lay foundation for future
Office2000 Unicode Strategy
Office2000 goals
– Reduce Total Cost of Ownership for large corporations
• Single version to deploy and administer globally
• Configurable interface to handle local needs
– Language of User Interface can be changed
– Additional language features can be enabled as needed
– Emulate any localized version
• “Français”, “日本語”, “한글”, “”عربي,“”עברית, etc.
– Streamline development process further
• Core “US” team ships global product
• Integrate bi-directional version team (Arabic, Hebrew)
– Focus on needs of bilingual and multilingual users
Office “10” Unicode Strategy
Office10 goals
– Finish the globalization work begun in Office2000
• Extend functionality to all applications
• Integrate Complex Scripts support (Indic, Thai, Vietnamese)
– हिन्दी, தமிழில், ภาษาไทย, Việt
– Streamline development process further
• Single build process from start to finish
• Integrate complex scripts team
– Deepen Unicode support
• Unicode 3.0 languages (ᐃᓄᒃᑎᑐᑦ, አማርኛ, etc.)
• UTF-16 (esp. plane 2: 𧆓𨣓𨲄𪀒)
• More complex script and limited combining diacritic coverage
The Word Family Tree
Version
US/Euro JPN KOR CHT CHS Bi-Di Thai/Indic
2.0/5.0
SBCS
DBCS
DBCS DBCS DBCS
SBCS
SBCS
6.0/95
SBCS
Wide
Wide
SBCS
SBCS
97
Unicode
Unicode
Unicode
2000
“10”
Wide
Unicode
Wide
Unicode
Unicode
(now w/ Indic)
A single
Unicode
release!
Implementation
Core applications are Unicode internally
– Word, Excel, PowerPoint (Office97)
– Access, Publisher (new in Office2000)
• Databases and drivers are Unicode
– Outlook, FrontPage (new in Office10)
• New Outlook local storage is Unicode
Implementation
Difficulties encountered with Unicode
– Lack of full system support in Win9x
– Every app needed different solution
• MFC-based apps were hardest
– Missing system services (e.g. font-linking)
– Interoperation with code-page based systems
– Educating test team about Unicode
• Testing issues different vs. MBCS
• Lack of expertise in uncommon languages
Implementation
Office shared code services
– Central Win32 Unicode text API “wrappers”
• Simulate nearly full support on Win9x
– ExtTextOutW and others
• Provide optional font-linked output
– Hardcode “preferred fonts” by script, style
– User-specified font-fallbacks via reg key (if any)
– Font categorization by script range (use MLANG.DLL)
• Font substituted if glyph not available
– Word modifies font settings in the document
– Other apps do only at display time
– Insert Symbol dialog (Unicode 3.0 support)
Office Users Benefit
Single binary world-wide
Shared world-wide file formats
Multilingual word/data processing
Unicode HTML
Unicode e-mail (HTML, RTF, plain)
Single Binary
Easier to deploy, administer
– One set-up image to install world-wide
– One set of service packs for all machines
All features available in all “versions”
– Still have local version packages
– Multilingual users can use “foreign” features
User Interface language is configurable
– Your language follows you when you travel
Major cost savings for customers
– Less testing of corporate solutions
– Lower internal tech support costs
Single File Format
Multinational corporations use Office
– Need to exchange documents company-wide
Office unified file formats via Unicode
– Word95 had 7 different file formats
– Word97 had 1 file format but no editing, layout for
languages covered by other versions
– Word2000 adds editing, layout, and full-roundtrip
– Word10 adds full complex script support
Multilingual Usage
English Office10: input/display/edit/layout of
– European languages
• any similar left-right scripts if fonts/NLS available
– E.g. Canadian Syllabics (Inuktitut), Ethiopic, Cherokee
• Some combining diacritic support (African languages)
– East Asian languages (including UTF-16 “surrogates”)
• Chinese (Traditional and Simplified), Japanese, Korean
– Complex Script and Bi-directional scripts (need enabled system)
• Arabic (incl. Farsi, Urdu), Hebrew
• Thai
• Hindi, Tamil, Oriya, Telugu, Punjabi, Bengali, Gujarati, etc.
Multilingual Usage
Most documents are monolingual
– Most users are bilingual
• Local language
• English
Optimize UI for using one, two or three languages
– Over 100 supported – rare usage
Detect 20+ languages while typing (Word)
– Automatically install and use the correct proofing tools
Plain text I/O in any encoding (Word, Excel)
Multilingual Word Processing
Proofing tool interfaces are Unicode
– SDKs available for 3rd party development
Tools for over 35 languages available
– European languages, Japanese, Chinese, Korean,
Arabic, Hebrew, Thai, Hindi…
– Spelling, Grammar, Hyphenation, Thesaurus
• Traditional/Simplified Chinese conversion
• Japanese character usage consistency checker
• Hangul/Hanja conversion
– Translation dictionaries (available offline)
– Automatic translation web services
Multilingual Data Processing
Access databases are Unicode
– Hook up to SQL7.x/2000 Unicode databases
Excel workbooks are Unicode
– Hook up to Unicode databases using OLE-DB
– Create Pivot lists and manipulate Unicode data
PowerPoint creates multilingual multimedia
– Web sites, animations
Web sites
URLs transmitted in UTF-8 (before the “?”)
FrontPage
– Create and edit web pages in Unicode
Word
– WYSWYG Web pages
– Save in full or “filtered” HTML
IE5+
– Display Unicode 3.0 pages
Mail and PIM
Outlook
– New local storage is Unicode
• Contacts, Calendar, Tasks etc.
– Display and message handling all Unicode
– Send/receive mail in any encoding
Unicode HTML
HTML is a companion file format
– Roundtrip all formatting
• Optional HTML Filter cuts file size for publishing
– Save to web servers directly
– Roundtrip Unicode data in any encoding
• UTF-8 and UTF-16 are supported too
– HTML is tagged with encoding
Unicode e-mail
Office2000/10 provides fully multilingual email
– HTML mail uses internet standards
– All Unicode content preserved
Plugs into Outlook, Outlook Express, Exchange
– Use Word to compose replies and new messages
– Send in plain text, RTF, or HTML
All applications can mail documents as HTML
Future Directions
Help Windows build a worldwide platform
– Ensure system support is useful to app writers
– Unicode 3.0 languages too
Extend Unicode support to more apps
– Visual Basic Editor and Forms
Microsoft “Word10”
on
“Whistler”
Demo
Questions
Answers