Unicode Migration

Transcript Unicode Migration

Module 6:
Global Deployments using
Siebel Global Deployments
Unicode with SQL Server
©Siebel Systems 2005 – Do not distribute or re-use without permission
Global Deployments using Unicode with SQL Server
What is a “Global Deployment”
Challenges
Underlying technologies
Siebel implementation
How to get there
Agenda
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
The Business Challenge
Global Organisations require Global Solutions
View the User Interface in Multiple Languages
Store data from many languages
Display the data using regional preferences
Different data for different regions
Customers have a single view of the company,
and the company has a single view of its
customers
The Business Challenge
Reduced cost of ownership
Centralised IT infrastructure
Centralised Data (single source)
Development and testing only happen once
Increased Customer Satisfaction
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
Traditional Code Pages
Tables that relate binary values to graphical characters and
symbols
Binary values are transmitted as single or multiple bytes
 Western character sets are single-byte
 Many Asian character sets are multi-byte
Sometimes called “character sets” or “code sets”
 The term "code" emphasizes the binary value aspect
 The term "character" emphasizes the graphical
representations (bitmaps or glyphs)
Code Page Examples
1252 Western European
Code Pages and Supported Languages
All standard code pages support English
Windows Code
Page Number
Languages
Commonly Used
Code Page Name
1252
English, Albanian, Basque, Catalan, Afrikaans, Danish, Dutch,
Finnish, French, German, Icelandic, Italian, Norwegian,
Portuguese, Spanish, Swedish
8859-1
1250
English, Czech, German, Hungarian, polish, Romanian,
Slovak, Slovenian
8859-2
1254
Same as 1252, but with Turkish replacing Icelandic
8859-9
1253
English and Greek
8859-7
1255
English, Hebrew, Yiddish
8859-8
1256
English and Arabic
1251
English and Russian
8859-5
932
English and Japanese
Shift-JIS
949
English and Korean
950
English and Traditional Chinese
Big 5
936
English and Simplified Chinese
GBK
KS C 5601
Codepage Comparison: ASCII
Codepage Comparison: ANSI 1252 (Western European)
 Windows 1252 Codepage – Known as WE8MSWIN1252
 Characters are allocated in the 80 - 9F area
Codepage Comparison: ISO 8859-1 (Western European)
 Known as WE8ISO8859P1
 Chars between 80 and 9F not allocated.
 Characters A0 to FF are the same as in MS CP 1252
 Except: There is no Euro codepoint!
Codepage Comparison: ISO 8859-15 (Western European)
 Known as WE8ISO8859P15
 Chars between 80 and 9F not allocated.
 Characters A0 to FF are NOT the same as in MS CP 1252
 8 characters differ, all of which are in the ANSI 80 - 9F
area.
80
8E
8C
8A
9A
8C
9C
9F
Codepage Comparison: ANSI 1250 (Eastern European)
 Chars between 80 and 9F are allocated.
Codepage Comparison: ISO 8859-2 (Eastern European)
 Chars between 80 and 9F not allocated.
 Characters between A0 and FF are NOT the same
 There are 15 differences, 10 of which are in the ANSI
80 - 9F area.
A5
BC
8C
B9
BE
9E
A1
8A
8D
8F
8E
9A
9D
9F
9E
Character Conversion
00
3
4
C
4
4
D
Ä
5
5
E
Å
6
6
F
Æ
F
?
Japan
Byte Stream:
Western
Europe
‘44 C4 46 C5’
WE Output:
Japanese Output:
‘DÄFÅ’
‘D聞?’
C5
0
1
2
0
情
詳
業
C4
3
4
51
し
同
4
さ
表
2
ら
い
金
00
3
4
1
新
報
4
4
D
2
聞
発
5
5
E
6
6
F
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
Unicode Codepages
What is Unicode?
Unicode is a codepage intended to support all
languages.
Unicode contains characters from all traditional
codepages and satisfies the need for virtually all
world languages.
Unicode Flavors
UTF-16
 Multi-double-byte encoding; uses one or two double-byte chunks to
represent the available characters.
 Supports Unicode Standard 2.0 and above
 Supports Surrogate characters
 Encoding standard used by Siebel internally for executables
 Often synonymous with UCS2, but supports characters consisting of
multiple 2-byte blocks
UCS2
 Double-byte encoding
 Supports Unicode Standard 1.x
 Official codepage name used in support statements by Microsoft for
Microsoft SQL and Windows Server 2003 and 2000
Unicode - What it gives you
Consolidation Possibilities
 Customers can consolidate data that would previously
have to be in separate Enterprises due to codepage
restrictions.
 Customers are not bound by codepage restrictions and
can consolidate HW. Performance may still dictate
distributed HW.
Data Sharing
 Customers can share data that could not be shared before.
Easier Management
 For Siebel deployments crossing languages and regions.
Unicode - What it doesn’t give you
Don’t expect Unicode to…
 Be the savior of all Global Deployment problems.
Unicode is only a codepage.
 Do language translations.
If my friend in Japan enters Japanese text, I don’t automatically see it
in English here.
 Remove the need to implement solid Business Practices for
managing data in multiple languages.
Do you want your Italian users to get contact information for Chinese
contacts in Chinese?
Should a customer in Japan be able to see products available only in
Spain in Spanish?
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
Siebel 7.5 Global Unicode-enabled Deployment
Mobile
Clients
Web
Clients
Local DB
ENU
FRA
ESN
JPN
Web Server,
Web Engine
Gateway Server
The
Internet
Actuate
Report
Servers
Application
Server
File
System
User
Data
ENU
FRA
ESN
JPN
ENU
FRA
ESN
JPN
ENU
FRA
ENG
FRS
Siebel
Database
ENU
ENU
FRA
FRA
ESN
ESN
JPN
JPN
OM
SRFs /
Locales
Siebel
Tools
Repository
Siebel and Unicode
Unicode support from Siebel 7.5 on
 This presentation refers to Siebel 7.5 or later
Supporting characters from more than one Codepage
 One Siebel Enterprise -> One Database -> One Codepage
 Requires Unicode Database
Single Executable encoding
 All internal processing within Siebel in Unicode
 Data converted as necessary
Siebel still supports non Unicode databases
 Western European (1252)1
¹ - See Siebel Systems Requirements and Supported Platforms Guide on SupportWeb for the latest information
What does Siebel v7.5 with Unicode look like?
Locales
Web Clients:
ENU
From Siebel 7.5 one
physical server can support
multiple locale-specific data
formats because Siebel
locale settings are
independent of those used
by the Server OS.
DEU
ESN
For Mobile Clients, the
locale can be changed by
the user through the
Regional Settings/Options
in Windows Control Panel.
Actuate
The Actuate Reports Server is fully Unicode enabled
This enables consolidated handling of all languages and locales on a single
server with the language selection at runtime:
Configuration Considerations
Data Segregation
 Which data is supposed to be entered in local languages
and which in corporate language. Is the following view
acceptable to end users?
Integration Character Conversion
UCS2
Unicode
0000
14
15
16
…
44
45
46
…
C4
C5
C6
00
…
Western
Europe
3
4
C
4
4
D
Ä
Д
5
5
E
Å
E
6
6
F
Æ
Ж
F
?
3
4
C
4
4
D
Д
5
5
E
E
6
6
F
Ж
F
?
04
?
D
E
F
Data Intact
Ä
Å
Æ
Cyrillic
Integration Character Conversion
UCS2
Unicode
0000
14
15
16
…
44
45
46
…
C4
C5
C6
00
?
D
E
F
…
Western
Europe
3
4
C
4
4
D
Ä
Д
5
5
E
Å
E
6
6
F
Æ
Ж
F
?
04
Solution:
Updates not
allowed.
Un-Displayable
Data is read-only
Character Conversion
Ä
Å
Æ
Cyrillic
3
4
C
4
4
D
Д
5
5
E
E
6
6
F
Ж
F
?
Siebel eBusiness Application Integration
Integration
 Systems with a multitude of code pages
 Even with Unicode, company may have different ‘flavors’
 External partners may have data in different code pages
 Often the cause of data corruption when moving to Unicode.
Solution
 Investigate ALL interfaces whilst planning Unicode migration
 Re-develop Interfaces and upgrade external systems/middleware as
necessary to support Unicode
 Use the Siebel Transcode Business Service to validate data before
sending it and do not send if it cannot be stored.
 TEST, TEST and TEST again!
Siebel eBusiness Application Integration
Transcode Business Service
 Business Service to convert data between codepages
Two modes
 Validate – Only checks if a conversion could be performed
without character conversion error
 Convert – Converts data between code pages
Can be employed in Workflow Processes
Need to implement error handling
Data Growth
SQL Server uses different data types for Unicode and NonUnicode data
Unicode data types require more space than Non-Unicode.
Data
Fixed Length
Variable Length
Large
Bytes per character
Non-Unicode
char
varchar
text
1
Unicode (UCS-2)
nchar
nvarchar
ntext
2
Could cause rows to exceed SQL Server page size (8KB)!
 May need to move some columns to extension tables
Database Growth moving to Unicode
Language
National
Code Page
UTF-16/UCS2
English
1 byte
2 bytes
~70%
Western
European
1 byte
2 bytes
~70%
Eastern
European
1 byte
2 bytes
~70%
Asian
2 bytes
2 bytes
0%
varchar/char/text data expansion only!
Note: Database expansion will vary substantially with
profile of data
Collation Sequences / Sort Order
What is the Collation Sequence / Sort Order?
The way in which characters are ordered changes for
different locales e.g.
A collation sequence is not the same as a sort order.
Collation sequences a unicode and a non-unicode sort order
together with a non-unicode code page, so have a greater
impact than simple sort orders. A collation sequence can also
affect unique keys as they affect string comparisons.
SQL Server Collation Sequences/Sort Order
Siebel Enterprise Database (Development System)
 Binary
Siebel Enterprise Database (Production System)
 Binary (Recommended)
This is also the only supported collation sequence for UCS2
(unicode)
 Dictionary case-sensitive and case-insensitive
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
Moving to Unicode
New Install:
 Create Unicode database
 Select Unicode as code page during install.
Migration from standard codepage
 Unicode migration is not a simple matter of altering the
codepage and REQUIRES MANDATORY assistance of
Siebel Expert Service to protect your data from corruption
during the migration process. The length of this
engagement will vary depending on the complexity of the
environment and the size of the database.
 Get details from TAM or Practice Manager
Preparation (Source database)
Run DBCHCK to validate Siebel Schema matches
repository
Check for conversion errors
Delete inactive repositories
If Development Environment Check In ALL
Projects
Create “awkward” test records
Use unusual characters
Backup the source database
Preparation (Target database)
Create new database with Code Page UCS-2
Should be in same SQL server instance as source
database
Increase storage space
Ensure large space for tempdb
Run grantusr.sql to create default users and roles
Migrate all users (not created during migration)
SQL Server – Unicode Migration (migrate.bat)
CP
(Source)
char(1)
char(1) varchar2(10)
varchar(40)
char varchar
char varchar
char varchar
1. ddldict - Generate
schema.ddl with schema
definitions from source (CP)
database
schema.ddl
nchar(1)
nvarchar(40)
char varchar
char varchar
2. ddlimp /Z Y - Apply
schema.ddl to target (UNI)
database through ddlimp
with /Z Y option (converts
char to nchar, varchar to
nvarchar and text to ntext)
insert.sql
3. Genload – Creates
insert.sql file for
migrating data from
source to target.
UNI
(Target)
4. insert.sql – Runs SQL
statements loading the data.
insert into target..table (col...)
select col... from
source..table
Preparation of ‘migrate.bat
Update environment specific variables
Users, Passwords, ODBC DSNs, etc
Ensure ‘ddldict’ call includes ’/A Y /T DCIER’
Not included in all versions of ‘migrate.bat’
Steps in ‘migrate.bat’ script
1.
2.
3.
4.
5.
6.
Create schema definition (ddldict)
Create physical schema for new database
(ddlimp)
Create sequence (SQL)
Create clustered indexes (SQL)
Create scripts to migrate data (genload)
Run script to migrate data (insert.sql)
 insert into dbo.S_ETL_CTRYREGN (COUNTRY
,REGION )
select COUNTRY ,REGION from
<sourcedb>..S_ETL_CTRYREGN;
Steps in ‘migrate.bat’ script (contd.)
7. Create
non-clustered indexes (ddlimp)
8. Update data type fields in siebel tables (SQL)
System Preference
'Enterprise DB Server Code Page‘ = ‘utf-16’
S_APP_VER.UNICD_DATATYPS_FLG = 'Y'
9. Create views (SQL)
Post Unicode Migration Tests
Check migrated data
 Dump out binary values of “awkward” data & check values
are correct
 View records for “awkward” data
Check Unicode Database has sufficient spare space
Re-run ‘dbchck’
Ensure that Unicode characters can be entered and
displayed correctly
Post Unicode Migration Tests
Test connectivity via a Siebel Web Client.
Test connectivity via a Siebel Tools Client.
Generate a new database template.
Create a new database extract.
Initialize a new Tools local database and perform a GET
operation.
Create a new SRF by performing a full compilation.
Generate browser scripts
Post Unicode Migration Steps
Backup Unicode Database
Update Web Template file (jctrl.css) with Unicode Font
Details
 i.e. For Windows 2000 all occurrences of ‘JFONTFAMILY=Arial’ replaced by ‘JFONT-FAMILY=Tahoma’
Direct Siebel Servers to Unicode Database
 i.e. Update ODBC DSN settings
Re-Extract Mobile Clients and Developer Databases
 Carry out Full Repository Get on Developer Databases
Summary
Overview
Introduction to Code Pages
Introduction to Unicode
Unicode and Siebel
Unicode Migrations
Additional Resources
Siebel Expert Services Offerings
 Global Deployment Workshop
 Unicode Migration Workshop
 Unicode Migration
 Unicode Migration Validation
Technical Note 455: How can EAI processes be enabled
for Global Deployment?
Alert 573: Tools Repository Compilation Performance
Issue on Microsoft SQL Server Unicode Database in 7.5
Additional Resources
Titus Unicode Charts, reference for Unicode codepoints:
 http://titus.uni-frankfurt.de/unicode/unitest.htm
Unicode Primer:
 http://www.menteith.com/unicode/primer/
Unicode Overview:
 http://www.basistech.com/papers/unicode/overview.html
Unicode Consortium Resources:
 http://www.unicode.org
Any Questions….
Module 6:
Global Deployments using
Siebel Global Deployments
Unicode with SQL Server
©Siebel Systems 2005 – Do not distribute or re-use without permission
Siebel Global Deployments
©Siebel Systems 2005 – Do not distribute or re-use without permission
Siebel 7.7 Enhancements for Global Deployments
“Smart Charset”
 Responds to email with same code page (character set) as received
in.
 Useful for Asian browsers that do not support the Unicode encoding of
emails
“Symbolic String model” for translation of UI data
 Application specific translation table for key terms
 Translation stored only once and reused throughout application
 Reduces the size of the Repository
 Reduces the complexity of adding new languages to the User Interface
Locale Management Utility (LMU) enhancement
 Native LMU XLIFF support. XLIFF is XML localization industry
standard
 Standards-based support for integration with localization tools greatly
reduces localization engineering time spent on import/export file
transformation tasks

Unicode Migration

Transcript Unicode Migration

Directory