NoCOUG Persistance of Memory Issues

Download Report

Transcript NoCOUG Persistance of Memory Issues

Siebel CRM Unicode
Conversion – The DBA
Perspective
Brian Hitchcock
OCP 8, 8i, 9i DBA
Sun Microsystems
[email protected]
[email protected]
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 1
CRM Unicode Conversion
 Three separate presentations
–
–
–
1) The overall conversion process
 What we had, what we wanted, how to get there
 Issues that come up during conversion
2) Multi-byte data in the existing CRM db
 What’s the issue, how did it happen
 A general method to find and fix this problem
3) The actual conversion
 What really happened
 Issues that came up and how they were resolved
 Focus on DBA issues, not Siebel application
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 2
How Did I Get Involved?
 Sleeping in a meeting…
 Heard someone say
–
“We told the users to stop entering Japanese into
the CRM system but we aren’t sure they stopped”
 Woke up, said
–
–
“I’ve done that before…”
See “Case of the Missing Kanji”
 Don’t wake up in meetings…
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 3
What’s The Issue?
 Existing Siebel CRM system
–
–
Oracle 8.1.7.4
Single-byte character set (WE8ISO8859P1)
 Interface systems
–
–
Multi-byte character set(s) (UTF8)
Handle data between single,multi-byte apps
 Want to convert to Unicode
–
–
Siebel, database, interfaces all should be UTF8
Eliminate interface systems
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 4
What we had
Amer
8859P1
Users
Emea
Apac
UTF8
UTF8
Tcustdb Apac
Custdb Apac
UTF8
UTF8
Tcustdb Emea
Siebel CRM
8859P1
Oracle Db
Custdb Emea
Custdb Amer
WE8ISO8859P1
Ordering System
WE8ISO8859P1
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 5
What we wanted
UTF8
Amer
Users
Emea
Apac
UTF8
Custdb Apac
UTF8
Custdb Emea
Siebel CRM
UTF8
Oracle Db
Custdb Amer
WE8ISO8859P1
Ordering System
AL32UTF8
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 6
What We Wanted
 All data in one database
–
–
All languages
Unicode
 Eliminate interface systems
–
Reduce support costs
 Support increased CRM functionality
–
–
All data in one place
Supports new business functionality
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 7
Would you like fries with that?
 Unicode conversion includes
–
Oracle db
 Convert to AL32UTF8 character set
- Required by Siebel for Unicode
 Upgrade to 9.2.0.4
- Required to get AL32UTF8 character set
–
Remove Tcustdb databases
 Modify triggers that link source db to Tcustdb
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 8
And A Shake?
 And, while you’re at it…
–
Application GUI
 Retrieve different data, multi-byte, local language
–
Clients
 Upgrade to Oracle 9.2.0.4 (SQL*Plus)
 Lots of changes all at once
–
–
Testing
How to know impact of each change?
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 9
Converting to Unicode
 It’s easy – right?
–
Siebel CRM
 make some configuration changes
–
Oracle database
 Export from single-byte database
 Import into new db created with UTF8 char set
–
–
Testing
Done
 This is the ‘management’ view
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 10
What Is Unicode?
 International standard
 Collection of characters
–
Covers most of the world’s languages
 Chinese poetry?
–
All characters have unique byte-code
 Application developers
–
–
Support Unicode
No need to worry about specific languages
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 11
You Make This Stuff Up!
 What follows can be found in
–
–
–
Oracle9i Database Globalization Support Guide
Release 2 (9.2)
Part Number A96529-01
 Or, you can trust me…
 Character sets, Unicode
–
–
Consist of set of characters
Encoding of the characters to byte-codes
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 12
Single Byte Encoding Schemes
 7-bit encoding schemes
–
–
–
Single-byte 7-bit up to 128 characters
normally support just one language
US7ASCII
 8-bit encoding schemes
–
–
–
Single-byte 8-bit up to 256 characters
often support a group of related languages
WE8ISO8859P1
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 13
8859P1 Character set
Oracle Character Set WE8ISO8859P1
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
Hex 0x41 is A
DCSIT Technical Services DBA
Page 14
Multi-byte Encoding Schemes
 Fixed-width
–
–
–
each character occupies a fixed number of bytes
Faster text processing
AL16UTF8
 Variable-width
–
–
–
one or more bytes to represent a single character
Saves disk space (typically lots of disk space)
UTF8, AL32UTF8
 Shift-sensitive variable-width
–
use control codes to differentiate single-byte multi-byte
characters with the same code values
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 15
UTF8 Byte Storage
Different characters occupy 1, 2, 3 or 4 bytes
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 16
AL32UTF8
 UTF8
–
–
–
Supports Unicode 3.0 since 8.1.7.4
Up to 3 bytes per character
Supplemental characters
 Pairs of 3 byte character codes
 AL32UTF8
–
–
Supports Unicode 3.1 (latest version?), since 9i
Up to 4 bytes per character
 Supplemental characters
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 17
Confused?
 Unicode, a set of characters
 Character set, encoded set of characters
 Encoding scheme, UTF-8, ISO standard for
variable width encoding of Unicode character
set
 UTF8, Oracle implementation of UTF-8
 If you’re not confused, you aren’t paying
attention!
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 18
Changing Character Set
 You can simply alter the database (right?)
 Only works if
–
–
new character set is strict superset of existing
character set
For all characters in existing character set
 All exist in new character set
 All have exact same code in new character set
 Example
–
–
WE8MSWIN1252 (superset, includes euro)
WE8ISO8859P (subset)
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 19
Complexities
 Even for the same character
–
Different encoding in different character set
 Example
–
–
–
Latin (Western European) character á
E1 in WE8ISO8859P1
C391 in UTF8
 If existing character not in new char set
–
? (replacement character) displayed
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 20
Cure
 Create new database
–
Using new character set
 Extract data from old database
 Insert data into new database
 Export/import is most often used
–
Could use other methods
 Extract data to flat files
 SQL*Loader
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 21
Database Conversion
 Serial
–
Upgrade source, export, drop schemas, import
 Parallel
–
–
–
Create target
Export source
Import to target
 Chose Parallel
–
Source still available after target in use
 User tablespace issue for example
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 22
Impact of Unicode
 Table columns must be widened
 Existing column
–
–
–
Holds up to 20 Latin characters
WE8ISO8859P1, each Latin character 1 byte
VARCHAR2(20)
 New column
–
–
–
UTF8
Each Latin character occupies 2 bytes
Need VARCHAR2(40)
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 23
Impact of Unicode
 Worst case
–
–
–
UTF8 can have up to 4 bytes per character
For all existing character columns
Need to expand by 4x
 Disk space
–
–
CHAR – 4x disk space
VARCHAR2 – 1x to 4x
 Depends on specific characters inserted
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 24
Impact of Unicode
 Tables
–
–
Columns must be wider
Each character can be up to 4 bytes
 Triggers, PL/SQL code
–
Modify to handle multi-byte data
 End-user front-end (browser)
–
Reconfigure to
 Display multi-byte data, accept multi-byte data
 All app components must handle Unicode
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 25
User Impact
 VARCHAR2, AL32UTF8
–
4000 byte limit
 How many characters can I enter?
–
–
Latin, 2000
Japanese, 4000/3
 If moving from Japanese character set
 2 bytes per character
 Max characters reduced by 1/3
–
Supplemental characters, 1000
 Characters like ‘treble clef’
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 26
Disk Space
 How much multi-byte data do you have?
–
–
–
We found all of ours
Typically, 5-10%
See 2) Multi-byte data in the existing CRM db
 Compute disk space requirement
–
–
If you have 5% multi-byte character data
Need maximum of 20% more disk space
 Will you add more multi-byte data?
–
Once you have converted to Unicode…
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 27
Expanding Columns
 Need to expand lots of columns
–
–
Individual SQL statements
Lots of SQL to generate
 How to make Oracle do this for us?
–
–
Export existing database
New database has init.ora parameter
 NLS_LENGTH_SEMANTICS = CHAR
–
Import into new database
 All character columns widened as tables created
- VARCHAR(10) becomes VARCHAR(40)
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 28
Character Semantics – 9i
 Change column data types
–
–
–
VARCHAR2(10 byte)
VARCAHR2(10 char)
Requires SQL statement for each column
 NLS_LENGTH_SEMANTICS
–
–
–
–
–
Init.ora parameter
What happens if init.ora changed?
BYTE or CHAR
All character columns created with byte or char
Handles PL/SQL code as well
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 29
The Siebel Process








Create target database
Shutdown app
Upgrade Oracle client
Source db character set
Run migrate.sh script
Full export source
Import to target db
Modify target db
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 30
Create target database




Oracle 9.2.0.4
Character set AL32UTF8
Character semantics CHAR
Tablespace names same as source db
–
15% more space than source db
 Locally managed, uniform 130k
 Auto UNDO, tablespace
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 31
Shutdown app




Shutdown various app servers
Shutdown source db
Cold backup
Upgrade source db to 9.2.0.4
–
Migrate 8.1.7.4 to 9.2.0.4
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 32
Upgrade Oracle client
 Upgrade Oracle client software to 9.2.0.4
–
–
–
For all machines that have SQL*Plus
Upgrade to 9.2.0.4
Install 9.2.0.4
 Client install only
–
–
Tar up 9.2.0.4 client ORACLE_HOME
ftp, untar on machines that need SQL*Plus
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 33
Source db character set
 Fix any user tablespace issues
–
Import won’t fix them for you
 Change source db character set
–
WE8MSWIN1252
 Siebel requirement
 Contains euro symbol
 Is a strict superset of WE8ISO8859P1
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 34
Run migrate.sh script
 Siebel supplied script
–
Generates various scripts
 Expand.ksh
- Widen columns for Unicode
 Impexp06.ksh
- Import individual tables for large dbs
- We use full export/import instead
 Run sun_expand.sql
–
Widen columns in tables outside Siebel schemas
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 35
Export Source, Import Target
 Full export of source db
–
Source db is now 9.2.0.4
 NLS_LANG
- AMERICAN_AMERICA.AL32UTF8
 Import into target db
–
Target db created as 9.2.0.4
 NLS_LANG
- AMERICAN_AMERICA.AL32UTF8
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 36
The conversion setup
WE8ISO8859P1
Source Db
AL32UTF8
Target Db
WE8MSWIN1252
Source Db
WE8MSWIN1252
import
WE8MSWIN1252
export
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 37
Modify target db
 Run impexp06.ksh
–
Handles sequences etc.
 Run check_schema.sql
–
Find columns that didn’t get widened
 Various changes on Siebel App side
 Verify db links to Custdb databases
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 38
Conversion Complete?
 Siebel process is done
 Fix any data issues
–
–
Multi-byte character data in source db
Convert properly to AL32UTF8
 Testing Unicode changes
–
–
GUI changes
Performance
 Unicode processing
 Users accessing different data
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 39
Multi-byte Data In Source Db?
 Source db is WE8ISO8859P1
–
–
Single-byte character set
Doesn’t support multi-byte characters
 That’s the official story
 The reality is somewhat different
 What, if any multi-byte data is in source db?
–
–
–
How to determine correct character set?
How to find, how to fix?
Japanese, Russian, others?
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 40
CRM Unicode Conversion
 Three separate presentations
–
–
–
1) The overall conversion process
 What we had, what we wanted, how to get there
 Issues that come up during conversion
2) Multi-byte data in the existing CRM db
 What’s the issue, how did it happen
 A general method to find and fix this problem
3) The actual conversion
 What really happened
 Issues that came up and how they were resolved
 Focus on DBA issues, not Siebel application
www.brianhitchcock.net
Brian Hitchcock September 15, 2004
DCSIT Technical Services DBA
Page 41