Ten More Database Mysteries

Download Report

Transcript Ten More Database Mysteries

Ten More Database Mysteries
Chris Lawson
Database Specialists, Inc.
www.dbspecialists.com
[email protected]
1
The Case of the Unwanted Services
• DBA detects listener starting extra services NOT defined!
> lsnrctl status
Services Summary…
database1
has 1 service handler(s)
database2
has 1 service handler(s)
databasez
has 5 service handler(s)
• They appear to be harmless, but what are these extra services?
• Restarted Listener. At first all OK, then 5 extra services activated.
• Confirmed that listener.ora file does NOT list the unwanted
database.
• DBA confirmed using correct listener.ora file.
• Puzzle: There is no entry whatsoever for databasez.
Unwanted Services: Solution
• Multiple listener services is indicative of MTS (Multi-threaded
Server) Dispatchers; however, MTS was not being used!
• DBA recalled that databasez was actually on a different server.
• Solution: Database on different server started MTS services.
• Key init.ora parameter:
MTS_LISTENER_ADDRESS =
"(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1526))"
• Services match init.ora parameter MTS_DISPATCHERS.
• How it happened: The remote init.ora file had originated on
‘our’ server, and the MTS parameter entries were not removed.
• Resolution: On other server, DBA corrected the init.ora entry,
bounced database. Extra services automatically stopped.
• But - that’s not all...
Unwanted Services:
The Rest of the Story
• These extra services turned out to be not so “harmless” after all.
• The ‘unwanted’ services are fully functional; they will intercept
connection requests and redirect them to a different server!
• Production users were redirected to a “clone” testing database
on another server, and used it for several hours!
• This occurs because each MTS service points to a particular
dispatcher on a particular server.
> lsnrctl services
Services Summary...
demo
has 3 service handler(s)
D001 <machine: hohp2, pid: 3385>
(ADDRESS=(PROTOCOL=tcp)(DEV=17)(HOST=[1.2.3.4)(PORT=1179))
• Perhaps could be used for rapid failover to standby database?
2
Incognito DBA Privileges
• Application testers need to “refresh” the test database
often.
• They will need to start/shut database as the oracle user.
• DBA, being suspicious (justly) by nature, is reluctant to
provide oracle account.
• As punishment, users frequently ask DBA to run the
refresh.
• Question: How can developers perform tasks as oracle,
without actually having the oracle account?
Incognito DBA Privileges: Solution
• Create script to refresh database, including startup, shutdown, etc.
• UNIX setuid feature allows programs to run with another identity.
But setuid feature does not propagate to commands within the script.
• Trick: Use a ‘wrapper’ C program that runs as oracle and calls the
refresh script:
# include <stdio.h>
main ()
{
system (“./refresh.ksh”);
}
• Activate setuid for the wrapper program: chmod 6711 refresh
• Users simply enter refresh to run the refresh program as oracle.
• Thanks to Brian Keating of Database Specialists for this solution.
3
NT Remote Control
• NT box is running an important database. It appears that
the Listener is down.
• NT server is remote.
• Question: How can you check the listener remotely?
NT Remote Control: Solution
• Use ‘SC’ command. It is part of the NT Resource Kit.
• The SC utility is very similar to the NET commands.
• First enable a security "context" to the remote box; e.g.
map a network drive to the remote server, and provide the
administrator account/password.
• The network drive may be disconnected after SC use is
done.
• Note: A mapped network drive is not required if you are
connected locally as administrator, and the remote server
uses the same administrator password.
• Run SC using the format:
SC \\[SERVER] [CMD] SERVICE
NT Remote Control: Solution
(continued)
Some common SC commands are:
query--------Queries the status for a service
start--------Starts a service
pause--------Sends a PAUSE control request
continue-----Sends a CONTINUE request
stop---------Sends a STOP request to a service
config-------Changes the configuration of a service
qc-----------Queries the configuration information
delete-------Deletes a service (from the registry)
create-------Creates a service (add to registry)
NT Remote Control: Solution
(continued)
EXAMPLE:
Check status of the Oracle 8 LISTENER service.
NT server named ‘TSUNAMI’
SC \\TSUNAMI QUERY ORACLETNSLISTENER80
SERVICE_NAME: ORACLETNSLISTENER80
TYPE
: 10 WIN32_OWN_PROCESS
STATE
: 4 RUNNING
(STOPPABLE,NOT_PAUSABLE,IGNORES_SHUTDOWN)
WIN32_EXIT_CODE
: 0 (0x0)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT
: 0x0
WAIT_HINT
: 0x0j
4
The Sad Case of the Failing Failover
• Very critical Customer Support application for large HMO.
• 24 x 7 crucial. Millions of monthly customers interactions.
• IBM RS-6000 High-availability cluster (HACMP)
• Scenario: Sudden hardware glitch causes failover to
backup node.
• Secondary node fails to start database! Database
complains of missing datafile.
• Critical application is down!
• Objective: Early-rising DBA (from different project) not
familiar with the setup needs to get database up!
The Failing Failover: Solution
• Investigation shows all file systems mounted. Bad News:
file in question is simply not there.
• Good News: File is not supposed to be there--only a
symbolic link.
• Each node has symbolic links pointing from “nicelooking” paths to the actual location of raw device:
/u04/oradata/data01.dbf > /dev/r123
• The primary node had all the links; the backup node was
missing 2 symbolic links. Simply creating the links
allowed database startup.
The Failing Failover: Solution
(continued)
• Moral of the story: High reliability hardware only as good
as the weakest link--including maintenance requirements.
• This strategy was very susceptible to human error. Every
new .dbf file required new symbolic link on both nodes.
One omission completely destroyed the high-availability
plan.
5
The Case of the Impatient
Insurance Agent
• Query to find details for expired insurance policies only
returns 50 rows, but takes 2 minutes to finish.
• Query is simple join of 2 tables: Response and Policy
RESP
policy_id
5 where-clause ‘filters’
result set: 20,000 rows
POL
1 ‘filter’
result set: 15,000 rows
result:
only 50 rows
• Puzzle: How can query ever be quick, since the ‘super
filter’ to reduce set to 50 rows is spread across two tables ?
Impatient Agent: Solution
The Trick:
1) Perform pre-processing that uses filters, but retrieves only
the key (policy_id) that is used as the join column.
2) This field is retrieved via index read only--thus avoiding the
table reads that account for the delays.
3) Use this result set as the starting point for the original join.
pre-processing driving “table”
RESP
index reads only
POL
original join
RESP
POL
table reads on small set only
6
Daffy Database Links
• Version 7.3.4 database. Database links working normally.
Database has been running straight for 3 weeks.
• Suddenly, the links fail:
ORA-12154: TNS:could not resolve service name
• Investigation shows tnsnames.ora file unchanged. Link
definition unchanged.
• Listener is running Oracle 8.0.4. Adding entry to tnsnames.ora
file in Oracle 8 directory causes the links to work again!
• Tests with new links show that the tnsnames.ora file for 7.3.4
is being ignored!
• Further, if Oracle 8 tnsnames file is removed, the one in Oracle
7 is used instead!
Database Links: Solution
• Clue: Listener had been restarted recently. But why should the
Listener process (a server side function) affect finding the tns alias?
• Metalink analyst suggested checking TNS_ADMIN; but how is that
relevant? We are not creating a UNIX session.
• Oracle Note 37808.1 clears-up the confusion:
For database links, TNS_ADMIN takes its value from
the value defined when the listener was started
• That is, a client process (link) is influenced by a server-side process!
• The listener had been started most recently with TNS_ADMIN set to
8.0.4, causing links to look in 8.0.4 tns file.
• Unsetting TNS_ADMIN, then restarting listener caused all operations
to run as expected.
7
All Primary Keys are Equal
(But some are more equal than others)
• Background: Application tuning often requires DBA to ‘lure’
optimizer into using certain indexes. Hints not always successful
or possible.
• Optimizer ‘likes’ certain indexes, because they typically are
faster than others. The favorite: an index on Primary Key.
• Scenario: 20 gigabyte insurance billing system. Oracle 8.0.4.
• Particular query joins to a table called POLICY. Join uses the
PK index (policy_ID).
• Query speed-up requires that join use new index on (PK, other
col).
• Created new double index--but optimizer would not cooperate.
It foolishly insists on using the PK index, even if hint used.
• Question: How can we get the optimizer to obey the DBA?
All Primary Keys are Equal: Solution
• We need a way to ‘trick’ the optimizer into not using the PK index,
but instead, use our ‘extra column’ index.
• This is difficult, because the PK index is the #1 choice.
• Trick: ‘Disguise’ the PK index as a unique index.
• Oddity: If unique index already exists, addition of a matching PK
will use the existing index.
• In determining execution plan, optimizer will treat the new index as
if it were a unique index, not a PK index.
• So, drop the PK, build a matching unique index, then rebuild the
PK.
• Optimizer no longer stubbornly insists on using this ‘PK’ index,
because it is not a true, pedigreed PK index--merely a unique index.
8
The Sad Case of the
Homeless Archive Logs
• Scenario: Large medical application; hundreds of connections
into database. This is a critical, 24x7 server.
• Users suddenly complain of database ‘hanging.’
• Alert log shows database unable to write archive log.
ORA-00255: error archiving log 1 of thread 1, sequence # 200
ORA-00270: error creating archive log /demo/arch/1_200.dbf
ORA-19504: failed to create file "/demo/arch/1_200.dbf"
• Investigation reveals disk crash on disk housing archive logs.
• No .dbf files on that disk--only archive logs.
• Problem: How can database operation be resumed, with
minimal disruption to hundreds of users?
Homeless Archive Logs: Solution
• DBA decided to dynamically redirect the archive logs, thereby
avoiding need to shutdown database.
• To change destination of archive logs:
alter system archive log start to ‘[new path]’
• For example:
SVRMGRL> alter system archive log start to '/demo/arch2/'
• Once the ‘backlog’ of archive logs is corrected, database
automatically resumes normal operation.
• Alert log now shows:
Fri Feb 11 09:00:47 2000
ARCH: Archival started
Archiver process freed from errors. No longer stopped.
9
The Puzzling Affair of the
Old Archive Logs
• With database in Archive Mode, old archive logs
must be purged.
• On Unix, this is simple, using the find command
(in cron).
• For example, to purge archive logs greater than 6
days old:
find /logdir/arch -name “arch*” -mtime +6 -exec rm {} \;
• Problem: How can we similarly purge the old archive
logs on NT?
Archive Logs: Solution
• On NT, there is apparently no ‘find’ command, as in Unix.
• The archive logs can be found indirectly, because the database ‘knows’
when/where they were written.
• Logs listed in V$ARCHIVED_LOG
• Example, to remove logs older than 2 days:
spool del_logs.sql
select 'del ‘ || name from v$archived_log where completion_time < (sysdate - 2);
spool off
host del_logs
• Similar query could be used to copy archive logs to backup.
10
SQL*Plus Madness
• Scenario: Connection ‘hangs’ upon trying to connect in
SQL*Plus.
• Also get error message re DBMS_APPLICATION_INFO
• DBA investigates. He finds:
1) Connect via Svrmgrl ok;
2) System user connects ok
• Problem: How can DBA ‘fix’ SQL*Plus?
SQL*Plus Madness: Solution
• Package DBMS_APPLICATION_INFO is used to ‘register’
a running application, for tracking/analysis purposes.
• Once an application is ‘registered’ it will show up in
v$session in ‘module’ field.
• SQL*Plus is one of few applications that try to ‘register.’
This explains why svrmgrl still works.
• This all points to problem with the ‘register’ of SQL*Plus.
• The set-up script to build necessary tables is called
‘pupbld.sql’
SQL*Plus Madness: Solution
(continued)
• Action: Try to run pupbld.sql as SYSTEM but it hangs!
• Investigation shows: product_profile table missing, but
synonym still there!
• Solution: drop public synonym for product_profile, then
rebuild pupbld. All OK now.
Contact Information
Chris Lawson
[email protected]
http://www.dbspecialists.com
Database Specialists, Inc.
388 Market Street, Suite 400
San Francisco, CA 94111