| Topic |
Discussion |
| Terms and Definitions |
| Term |
Definition |
| CSI |
Customer Service Identifier |
| MOS |
MyOracleSupport. The horrible, by improving, website that provides Oracle on-line support. |
| RDA |
Remote Diagnostic Agent |
| Sev |
Severity. An outage is Sev 1 |
| SR |
Service Request |
| TAR |
Technical Assistance Request: The old name for a Service Request (SR) |
|
| My Oracle Support: Topic 1 |
Set a timer. If you can not get everything back online in less than 3 minutes then your first step is to open an SR with Oracle Support.
If by some chance you fix the problem before they respond then close it: No harm done. If you have not then use Oracle Support to sanity check what you are thinking of doing.
It is not uncommon for DBAs to make a bad situation far worse by doing the wrong thing. You want Oracle to agree with your proposed course of action and, in addition to helping
you with your decision making, having them involved will keep your employer from making your life a living hell (see CYA). |
| My Oracle Support: Topic 2 |
Not all DBAs are of equal quality and not all support engineers with Oracle are either.
If you have uploaded an RDA and a support engineer asks you for information that is in that RDA ask to speak to an escalation manager: Doing so is your right as a customer.
Do not tolerate this person as they are either lazy or not sufficiently skilled to read what you uploaded.
If you have a support engineer that is asking you to run diagnostic test after diagnostic test, multiple system state dumps, etc. and it is not getting you anywhere ...
ask to speak to an escalation manager.
The time you are wasting is your own. |
| Change Management |
If you do not have the ability to, within one or two minutes walk through changes made to your environment by system admins,
network admins, storage admins, vendors, yourself, and other DBAs on your team your organization flunks by definition. Did a network engineer reconfigure the DNS server?
You do not know. Did a storage admin patch the software on your SAN? You have no idea. Did a UNIX SA decide to fence CPUs? You are blind. Another member of your team
change an init parameter? Perhaps. And exactly how perfect is anyone's memory about what they did yesterday much less on Friday of last week? Please don't make me laugh
(or cry).
Change management is a serious topic and you need to implement change management practices that include not just "here's what I did" but also the actual keystrokes,
spooled to a file, for later review. |
| RDA |
If you do not know what an RDA is know that I would never hire you as a DBA on any project and neither should anyone else.
If you do and you do not have a current one ready to send to Oracle attached to an SR know that I'd have you in my office immediately after the outage for a serious
discussion of your career path choices. If you have one ... then make sure it is updated every time you make a configuration change, zip it up, and have it ready to upload.
An RDA on a big system can take hours to create. Trying to create one during an outage is a guaranteed non-starter.
Not having a current, RDA ready to use when opening an SR, should be a firing offense on any production system. |
| Database Alert Log |
If you have not reviewed your alert log every day prior to this outage you are starting with a negative rating.
You have no idea what errors are unrelated to the current situation and which ones may be warnings prior to the current outage.
Find the alert log entries for the current outage and then review the previous 48 hours to see if there are any warnings that might be related.
If you see a warning or error that might be related and you do not know what it is ... use google. You do not have time to mess around with MyOracleSupport.
Copy those that might be related to an electronic scratchpad and hold them. |
| ASM Alert Log |
If you are using ASM then you have a separate Oracle instance that is the ASM home with its own alert log.
If you do not know where it is then you need to learn that now and document it as you should be reading the alert log every day for warnings and errors. |
| Clusterware Logs |
If you have a RAC cluster, by definition, you have Oracle Clusterware installed which means you have log files for the nodeapps and you
need to know where to find these and need to read them every day for warnings and errors. |
| Services |
If a user or application connects using a service, as they all should, and the service is not running the user or application will not be able to connect.
If you do not know what services should be running, and with RAC on what nodes, then one of your tasks is to document this using the following SQL statement when everything is fine. |
SELECT inst_id, name, goal, dtp, blocked, clb_goal
FROM gv$active_services
ORDER BY 2, 1; |
| Resource Manager |
Are you using Resource Management (DBMS_RESOURCE_MGR)? |