Oracle Real Application Clusters (RAC)
Version 11.1.0.7 - 11.2.0.3
 
Acronyms
RAC Related Acronyms and Definitions
Acronym Short For Comment
ASM Automated Storage Management Oracle instance for managing raw partitions
CRS Cluster Ready Services  
FCF Fast Connection Failover  
GSD Global Services Daemon  
OCFS Oracle Cluster File System Open-source file system that supports sharable files
SPAN A Single VIP Address spanning all clsuter VIPS  
TAF Transparent Application Failover  
VIP Virtual Internet Protocol (address) Required
Exceptions
Error Code Reason
ORA-025408 can not safely replay call Indicative of a session level failover from one node to another. Not necessarily indicative of node or server reboot
 
Checklist
Note: I have been involved in building hundreds of RAC clusters as well as troubleshooting problem with many tier 1 enterprise's clusters. Through all of this experience I have found some very common issues that keep recurring so I am bring them together here by category. What all have in common is a failure to read the docs and understand the technology before building the cluster.

RAC is a good technology ... it does what the marketing claims say it will do. But then so will an F22 jet but you'd best not take off without being a very experience pilot. Much like the jet ... RAC is complex ... it has a lot of interconnected moving pieces ... and if you have not already set up DR with DataGuard I recommend you not touch RAC in production until such time as you have. When things go wrong, and they will, if you do not have the ability to fail over to something stable you are going to find you are spending more time with an outage bridge than with your family.
Network Admin Issues
Every RAC cluster requires a minimum of three separate networks and, built properly, four with a total of eight separate connectors and eight separate physical switches. These are:
  1. Cluster Interconnect ... this should be a minimum of 10GbEth with two paths to two separate interconnected switches. It should not be a VLAN and should carry no traffic other than the a single cluster interconnect. I know network admins have a long list of reasons why this is totally unnecessary but history teaches that the vast majority of RAC outages are the direct result of network admins who think they are smarter than Oracle: None of them are. If a VLAN is forced on you then make sure that the ports chosen for the VLAN are vertically aligned on the switch not horizontally aligned. In other words plug up/down top-row to bottom row and not horizontally ... ports 1 then 2 then 3. Keep the VLAN on cards solely dedicated to VLAN traffic. Finally this MUST be configured using jumbo frames ... that means MTU=9000. If your network admins do not understand why then get them a pair of reading glasses and an Oracle Support Services read-only account.
  2. Storage ... you've really two choices here ... fibre to a SAN or fibre or copper to NAS. If fibre get something equivalent to the Brocade SAN Switch 300, get two of them, and multipath. If copper to NAS again multipath but be sure to use Cat 5e or 6.
  3. Public ... this network is how applications and users connect to the cluster nodes via a VIP or SCAN address. There are no special requirements here unless you think you are also going to use this network for RMAN backups, Data Guard replication, etc. My preference is always a fourth network
  4. Administration ... this network should have its own listener for Data Guard replication and backups should either be done on this network or, preferably, directly from your Data Guard physical standby.
Finally you MUST, and this is not optional, provide to the cluster an NTP server or punch a hole through the firewall to an NTP server so the cluster nodes are ALWAYS time synchronized. Let the time synchronization start wandering and you will have no one to blame other than yourself for what will happen next.
Storage Admin Issues
One of the best ways to mess up a RAC cluster, or for that matter any database, is to think you are going to incrementally give the DBA space a few of disk GB at at time adding lots of luns and probably, to make things even worse, RAID 5. If you do then likely you also start fires while pouring petrol.
  1. Give the DBAs two LUNs with all the space they will need for at least 12 months. One LUN should be RAID 10 to be used for redo logs and the temp and undo tablespaces. The second should also be RAID 10 but it is tolerable to use RAID 5 if the application's I/O demands are small to moderate. If you try to hoard the space and parcel it out you will be creating an I/O bottleneck you will not be able to fix unless your burn it all to the ground and rebuild it on another SAN.
  2. Stop your whining about the DBAs managing their space using ASM. With ASM the DBAs are far more capable of space management than you are. This is especially true with virtualized storage such as is available on EMC VMax and VNC SANs.
System Admin Issues
A RAC cluster is not "just" and Oracle database and an Oracle database is not an App server or a bunch of files in a file system. If you have not studied Oracle storage, and do not have a CSI number and an account with Oracle Support Services at MyOracleSupport please don't waste everyone's time trying to overrule the DBAs. Here's what you need to do.
  1. A RAC cluster that isn't just a joke can not be built using blades ... neither can it be built on a server that does not provide a minimum of more interface cards than you can stuff in a 1U server. Pay attention to the network requirements above and think 3U or greater. RAC is about eliminating single points of failure ... don't create new ones.
  2. Read the Oracle installation documents with very careful attention to the advice given for kernel parameters. If on Linux and you don't know what rmem and wmem are ... read the docs. If on Solaris you don't know what rsize and wsize are ... read the docs. If you don't know the difference between TCP/IP and UDP ... read the docs. Mess these up and you'll be cleaning up the resulting outages. Note also that the values in the Oracle install docs are often minimum starting values. Have the DBAs read the KnowledgeBase and based on reported issues give you the higher values where appropriate.
Database Admin Issues
Are we having fun yet?
  1. You are not going to run a successful implementation if you do not have node affinity. And you are not going to have node affinity without understanding services, resource management, and consumer groups. One of Oracle's big weaknesses is that every tool they've created is incapable, in and of itself, of creating a complete service. You use SRVCTL as root to create services at the clusterware level, you can use OEM Grid to do some things, and you can use the DBMS_SERVICE package too. Create all services using server control being use to use the -r and -a flags to define primary and available nodes: Then modify the services using DBMS_SERVICE to have some intelligence.
  2. Once you have your services use DBMS_RESOURCE_MANAGER to define consumer groups and resource pools.
  3. Make sure your scheduled jobs, DBMS_SCHEDULER, use job classes and have defined instances and priorities.
  4. Monitor interconnect traffic (demo code below on this page)
  5. Monitor remastering using v$gcspfmaster_info (demo code below on this page)
  6. Keep in mind that the SQL access plans created by the optimizer and stored in each node are not necessarily the same. If a query is run on multiple nodes then it may be advantageous to be sure that the plan is identical or corresponds to the normal workload on that node. You can do this with AFTER STARTUP system event triggers. If you have not used these before ... read the docs and use the demos here in the library.
  7. If you are not familiar with BIGFILE tablespaces because you are managing Database 11gR1 like it was 7.3.4 ... read the docs. Avoid a proliferation of datafiles unless the are specifically designated for date-range partitioning ... for example one tablespace per day ... and performing maintenance by dropping older tablespaces.
Developer Issues
Oracle claims that any application that will run on a stand-alone database will also run on RAC. They are correct ... it will. That does not mean it will run well or be tunable. So here's what you need to do:
  1. Whether writing SQL, PL/SQL, Java, C#, Perl or something else your biggest single issue is "Node Affinity. "If blocks are being transferred across the interconnect to often you can bring it all to its knees. To enforce node affinity you need to partition your application so that users or processes connecting to different nodes are not simultaneously trying to access the same block (data or index). An insert statement being executed on multiple nodes, using the same trigger and sequence object to create a surrogate key, and then access the same block of the B*Tree index that supports the primary key is, by definition, bad practice.
  2. If your logging does not tell you if you are running on a RAC cluster, how many nodes are active, which nodes they are, and on which node your code executed or failed your logging is of minimal or no value.
  3. If your sequence caches less than 1000 numbers at a time, and likley if it is ORDERED, drop and recreate it after reading Tom Kyte's advice on the subject.
 
Cluster Verify
Cluster Verify Commands
OptionDescription
-post hws Run following hardware and operating system installation
-pre cfs Run before cluster file system (ASM) installation
-post cfs Run following cluster file system (ASM) installation
-pre crsinst Run before clusterware installation
-post crsinst Run following clusterware installation
-pre dbinst Run before installation of the database instance
-post dbinst Run following database instance installation
# whoami

# root

# clvfy -post dbinst
 
CRSCTL Commands
Oracle Refs:
http://www.oracle.com/pls/db112/search?remark=quick_search&word=CRSCTL
http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/restart006.htm#ADMIN13262
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16794/crsref.htm#CWADD91147
Help crsctl -h
# ..crsctl -h
       crsctl check
       crsctl config
       crsctl debug
       crsctl delete
       crsctl disable
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
       crsctl
Register a VIP as a resource with Oracle Clusterware crsctl add resource <resource_name> -type <resource_type>
[-file <file_path> | "attribute_name=attribute_value,attribute_name=attribute_value,..."] [-i] [-f]
# crsctl add resource app.appvip -type app.appvip.type -attr "RESTART_ATTEMPTS=2,
START_TIMEOUT=100, STOP_TIMEOUT=100, CHECK_INTERVAL=10, USR_ORA_VIP=172.16.0.0,
START_DEPENDENCIES=hard(ora.net1.network)pullup(ora.net1.network),
STOP_DEPENDENCIES=hard(ora.net1.network)"
Register a resources based on the test_type1 resource type # crsctl add resource r1 -type test_type1 -attr "PATH_NAME=/tmp/r1.txt"

# crsctl add resource r1 -type test_type1 -attr "PATH_NAME=/tmp/r2.txt"
Create a Clusterware resource type crsctl add type type_name -basetype base_type_name {-attr
"ATTRIBUTE=attribute_name | -file file_path,TYPE={string | int}
[,DEFAULT_VALUE=default_value][,FLAGS=[READONLY][|REQUIRED]]"}
# crsctl add type app.appvip.type -basetype cluster_resource -attr "ATTRIBUTE=FOO,TYPE=integer,DEFAULT_VALUE=0"
Check status of OHAS and CS stack crsctl check crs
./crsctl check crs
Check status of CRS stack crsctl check cluster [[-all] | [-n <server>[...]]]
TBD
Check status of Cluster Time Synchronization Service crsctl check ctss
./crsctl check ctss
Check status of resources crsctl check resource {<resource_name> [...] | -w <filter>} [-n <server] [-k <cid> [-d did>]
TBD
Check status of Cluster Synchronization Services crsctl check css
./crsctl check css
Check status of Event Manager crsctl check evm
./crsctl check evm
Check ctss (cluster time synchronization services) crsctl check <nodeapp>
# crsctl check ctss

# crsctl cluvfy comp clocksync -n all

# cluvfy comp clocksync -verbose
Check nodeapp status crsctl check <nodeapp>
# crsctl check cssd

# crsctl check css

# crsctl check evm
Delete resource crsctl delete resource resource_name [-i] [-f]
# crsctl delete resource app.appvip
Delete type crsctl delete type type_name [-i]
# crsctl delete type app.appvip.type
Display CRS config information crsctl config <nodeapp>
???
Disable automatic CRS restart crsctl disable <nodeapp>
???
Enable automatic CRS restart crsctl enable <nodeapp>
???
Get Hostname crsctl get hoststname
# crsctl get hoststname
Display the user and group permissions for a resource crsctl getperm resource resource_name [ {-u user_name | -g group_name} ]
# crsctl getperm resource app.appvip

# crsctl getperm resource app.appvip -g dba
Get Permanent Type crsctl getperm type resource_type [-u user_name] | [-g group_name]
# crsctl getperm type app.appvip.type
Modify Resource crsctl modify resource resource_name -attr "attribute_name=attribute_value" [-i] [-f] [-delete]
???
Modify Type crsctl modify type type_name -attr "ATTRIBUTE=attribute_name,TYPE={string | int}
[,DEFAULT_VALUE=default_value [,FLAGS=[READONLY][| REQUIRED]]" [-i] [-f]]
???
Set Permanent Resource crsctl setperm resource resource_name {-u acl_string | -x acl_string | -o user_name | -g group_name}
???
Set Permanent Type crsctl setperm type resource_type_name {-u acl_string | -x acl_string | -o user_name | -g group_name}
???
Start the CRS service crsctl start <nodeapp>
???
Status Resource crsctl status resource
???
Status Type crsctl status type
???
Stop the CRS service
-f = Force
crsctl stop <nodeapp> [-f]
$ sudo /u01/app/oracle/product/11.1.0/asm/bin crsctl stop crs -f
 
CRS_STAT Commands
Cluster Ready Services Tool
OptionDescription
-a resource_name Used with the -g or -r option to verify whether the specified resource is registered or running under Oracle Clusterware. Specify the name of the resource as listed in its application profile.
-c cluster_node Displays information about applications or application resources on the specified cluster node.
-f Displays all information about the resource including extended information (-v) and the in-memory profile (-p).
-g Returns 0 (zero) if the specified application or application resource is registered with Oracle Clusterware; returns 1 if it is not. This option only works if a single resource is specified with the -a option.
-l Displays the status in a list (nontabular) format. This is the default format.
-ls Lists resources, owners, and permissions for all resources.
-ls resource_name Lists resources, owners, and permissions for a specified resource.
-p resource_name [...]] Displays the status of the in-memory profile for the specified resources. If no resources are specified, then the status of the in-memory profile for all registered resources is displayed. If a resource is not registered, its status is not displayed.
-q Runs the command in quiet mode (no messages are displayed on the console).
-r Returns 0 (zero) if the specified application or application resource is running under Oracle Clusterware; returns 1 if it is not. This option can only be successful if a single resource is specified with the -a option.
-t Lists information for all resources in a tabular form.
-v Lists how many times a resource has been restarted or has failed within the resource failure interval, the maximum number of times that a resource can be restarted or can fail, and the target state of the application. Also, this option lists normal status information.

This option displays extended information about the status of resources. Extra attributes shown include RESTART_COUNT and FAILURE_COUNT. The RESTART_COUNT attribute shows the number of restarts of the application that have been attempted since it was started. The FAILURE_COUNT attribute shows how many failures have occurred during the last FAILURE_THRESHOLD in seconds. The attribute FAILOVER_STATUS shows the time at which an application resource started while waiting to relocate due to a cluster node failure if the resource has a FAILOVER_DELAY value greater than 0. It is not displayed otherwise.
 
CRS_RELOCATE Commands
Relocate applications and application resources
Option Description
-c cluster_node Relocates each indicated application or application resource spcified node regardless of its placement policy. If required resources are not available on the destination node or if the application resource is restricted from that node, then the CRS_LOCATE command fails and the application or application resource remains on the current node.
-f Forces relocation of the specified applications, all applications dependent upon them and all applications that they are dependent upon. This option is necessary for reloating any application that requires another application or one that is required by any ONLINE application.
-q Runs the command in quiet mode (no messages are displayed on the console).
-s source_node Relocates all running applications or application resources from the source_node. If you do not also specify the -c option, then the CRS_RELOCATE command relocates each resource according to its placement policy and required resource list.
-t Lists information for all resources in a tabular form
[USR_attribute_name=value] When starting the resource, this option sets the named attribute to the given value in the CRS resource's environment.
CRS_RELOCATE resource_name [...] [-c cluster_node] [-f] [-q]
CRS_RELOCATE resource_name [-c cluster_node] [-q]
CRS_RELOCATE [USR_attribute_name=value] [...] resource_name [-c cluster_node] [-q]
CRS_RELOCATE -s source_name [-c cluster_node] [-q]
 
Cluster Health Monitor (CHM) OCLUMON Commands
Oracle Refs:
http://download.oracle.com/docs/cd/E11882_01/rac.112/e16794/troubleshoot.htm#CHDEEIGI
Debug oclumon debug [log daemon module:log_level] [version]
$ oclumon debug log osysmond CRFMOND:3
Dumps views from indicated nodes over the time period indicated oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h]
$ oclumon dumpnodeview -n node1 node2 node3 -last "12:00:00"

$ oclumon dumpnodeview -allnodes -last "00:15:00"
Relocate the CHM repository oclumon manage [[-repos {resize size | changesize memory_size | reploc new_location [[-maxtime size] | [-maxspace memory_size]]}] | [-get key1 key2 ...]]
$ oclumon manage -repos reploc /shared/oracle/chm
Retrieves the path to the CHM repository $ oclumon manage -get reppath
Retrieves the name of the master node $ oclumon manage -get master
Retrieve the CHM repository size $ oclumon manage -get repsize
Version oclumon version
$ oclumon version
 
Dictionary Stats
Dependent Objects
GV_$CLUSTER_INTERCONNECTS GV_$CONFIGURED_INTERCONNECTS
 
Interconnect Tuning Views
GV$CACHE_TRANSFER Monitor blocks transferred by object
GV$CLASS_CACHE_TRANSFER Monitor block transfer by class
GV$FILE_CACHE_TRANSFER Monitor the blocks transferred per file
GV$TEMP_CACHE_TRANSFER Monitor the transfer of temporary tablespace blocks
FAN Notification Related recent_resource_incarnations$
Global Views SELECT view_name
FROM dba_views
WHERE view_name LIKE 'GV%'
ORDER BY 1;
 
Failover
Failover comparison
FCF TAF
Works for Both Client-side drivers Works only for JDBC-OCI
Rapid Node/Service Failure Detection Rapid Node/Service Failure Detection
Connection Retry by Application or Container Transparent Connection Retry and Query Retry
In-Flight Transactions Automatically Rolled Back Application must Roll Back and notify TAF
Supports Dynamic Work Load Balancing No Support for Work Load Balancing
 
Initialization Parameters
RAC Modifiable Parameters set linesize 121
col name format a40
col value format a50

SELECT name, value
FROM gv$parameter
WHERE isinstance_modifiable = 'TRUE';
 
Links
Related Links
Topic Description Link
NIC Bonding on RHEL 4 Hewlett/Packard Resource Center Forums Click Here
 
Load Balancing
Follow the link below to the DBMS_SERVICE package.

Set the connection Load Balancing Goal to LONG for applications with long-lived connections (default). Set the goal to SHORT for short duration connections such as applications where a connection is made, a small discrete unit of work is performed, and the connection is then terminated.
 
Net Services
Note: Oracle Database 11g release 1 or 2 clients connect to the database using SCANs. The SCAN and its associated IP addresses provide a stable name for clients to use for connections, independent of the nodes that make up the cluster. SCAN addresses, virtual IP addresses, and public IP addresses must all be on the same subnet.

The SCAN is a virtual IP name, similar to the names used for virtual IP addresses, such as node1-vip. However, unlike a virtual IP, the SCAN is associated with the entire cluster, rather than an individual node, and associated with multiple IP addresses, not just one address.

Link: http://download.oracle.com/docs/cd/E11882_01/install.112/e10812/concepts.htm#sthref791
Maximum Availability Parameters
Option Valid Values
FAILOVER on, off, yes, no, true, false
LOAD_BALANCE on, off, yes, no, true, false
SOURCE_ROUTE Use the parameter SOURCE_ROUTE to enable routing through multiple protocol addresses.
When set to on or yes, Net Services tries each address in order until the destination is reached. The parameter can be embedded under either DESCRIPTION_LIST DESCRIPTION or the ADDRESS_LIST.

DEMO_RAC =
  (DESCRIPTION=
  (SOURCE_ROUTE=YES)
  (ADDRESS=                                        # hop 1
    (PROTOCOL=TCP)
    (HOST=vipalpha2.mlib.org)
    (PORT=1521))
  (ADDRESS_LIST=
    (FAILOVER=ON)
    (LOAD_BALANCE=off)
    (ADDRESS=                                      # hop 2
      (PROTOCOL=tcp)
      (HOST=vipalpha3.mlib.org)
      (PORT=1521))
    (ADDRESS=
      (PROTOCOL=tcp)
      (HOST=vipalpha1.mlib.org)(PORT=1630)))
  (ADDRESS=                                        # hop 3
    (PROTOCOL=tcp)
    (HOST=vipalpha4.mlib.org)
    (PORT=1521))
  (CONNECT_DATA=(SERVICE_NAME=tfta_alpha.mlib.org)))
Failover-Mode Parameters
Option Valid Values
BACKUP Specify the failover node by its net service name. A separate net service name must be created for the failover node.
DELAY Specify the number of times to attempt to connect after a failover. If DELAY is specified, RETRIES defaults to five retry attempts.
METHOD Specify how fast failover is to occur from the primary node to the backup node:
  • BASIC: Establishes connections at failover time. This option requires almost no work on the backup database server until failover time.
  • PRECONNECT: Pre-establishes connections. This provides faster failover but requires that the backup instance be able to support all connections from every supported instance.
RETRIES<number_of_retries>
SESSION Fails over the session; that is, if a user's connection is lost, a new session is automatically created for the user on the backup. This type of failover does not attempt to recover selects.
TYPE Specify the type of failover. Three types of Oracle Net failover functionality are available by default to Oracle Call Interface (OCI) applications:
Identify and test cluster interconnect SQL> desc gv$configured_interconnects;

Name               Null?    Type
------------------ -------- --------------
INST_ID                     NUMBER
NAME                        VARCHAR2(15)
IP_ADDRESS                  VARCHAR2(16)
IS_PUBLIC                   VARCHAR2(3)
SOURCE                      VARCHAR2(31)

SQL> SELECT *
   2 FROM gv$configured_interconnects
   3 ORDER BY by 1,2;

INST_ID  NAME         IP_ADDRESS       IS_PUBLIC
-------- ------------ ---------------- ----------
       1 bond0        10.20.70.11      YES
       1 bond0:1      10.20.70.10      YES
       1 bond0:2      10.20.70.101     YES
       1 bond1:1      169.254.85.254   NO
       2 bond0        10.20.70.13      YES
       2 bond0:1      10.20.70.12      YES
       2 bond0:2      10.20.70.101     YES
       2 bond1:1      169.254.44.94    NO
 
 
Oracle Cluster Registry Configuration Tool
ocrconfig -repair Repair an OCR file
ocrconfig -repair ocr [device_name]
ocrconfig -repair Repair and OCR mirror file
ocrconfig -repair ocrmirror [device_name]
ocrconfig -replace Replace the primary OCR using the desintation_file to designate the target location
ocrconfig -replace ocr destination_file
ocrconfig -replace Replace the primary OCR using the disk to designate the target location
ocrconfig -replace ocr disk
ocrconfig -replace Replace a secondary OCR mirror using the desintation_file to designate the target location
ocrconfig -replace ocrmirror destination_file
ocrconfig -replace Replace a secondary OCR mirror using the disk to designate the target location
ocrconfig -replace ocrmirror disk
 
Oracle Cluster Registry Utility
ocrcheck [root@mlibprod bin]# ./ocrcheck

Status of Oracle cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       7144
         Available space (kbytes) :     254976
         ID                       : 1687916673
         Device/File Name         :       +DG2
                                    Device/File integrity check succeeded
         Device/File Name         :       +DG3
                                    Device/File integrity check succeeded

                                    Device/File integrity not configured

                                    Device/File integrity not configured

                                    Device/File integrity not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded
ocr.loc cat /etc/oracle/ocr.loc

#Device/file +OCRVOTING getting placed by device +DG2
ocrconfig_log=+DG2
ocrmirrorconfig_log=+DG3
local_only=false
   
   
 
Oracle Interface Configuration
oifcfg -help  
 
oifcfg delif  
 
oifcfg getif  
 
oifcfg  iflist  
oifcfg iflist
oifcfg setif  
 
 
Server Configuration Tool
Downgrade configuration srvconfig -downgrade -dbname <database_name> -orahome <oracle_home>
srvconfig -downgrade -dbname TSTA -orahome $ORACLE_HOME
 
Server Control Commands
Add instance srvctl add instance -d <database_name> instance -n <instance_name>
srvctl add instance -d TSTA instance -n tsta3
Add TAF service srvctl add service -d <database_unique_name> -s <service_name> -r <preferred_instances_list> -a <available_instances_list> -P <failover_type [NONE | BASIC | PRECONNECT]>
srvctl add service -d TSTA -s TSTA_TAF -r TSTA1,TSTA2 -P BASIC
Remove Clusterware node applications from a node srvctl remove nodeapps -n <node_name>
srvctl remove nodeapps -n omega1
Remove database srvctl remove database -d <database_name>
srvctl remove database -d TSTA
Remove instance srvctl remove instance -d <database_name> -i <instance_name>
srvctl remove instance -d TSTA -i tsta2
Start nodeapps running on a node and check status srvctl start nodeapps -n <node_name>
srvctl start nodeapps -n TSTA1
crs_stat -t
Start service running on a node srvctl start service -d <database_name> -s <service_name>
srvctl start service -d TSTA -s TSTA_TAF
Show all services running on a node srvctl status nodeapps -n <node_name>
srvctl status nodeapps -n omega1
Stop Clusterware node applications on a node srvctl stop nodeapps -n <node_name>
srvctl stop nodeapps -n omega1
Start and stop database srvctl <status | start | stop> -d <database_name>
srvctl stop database -d MLIBRAC1

srvctl start database -d MLIBRAC1
Status, start and stop instance srvctl <status | start | stop> instance -d <database_name> -n <node_name>
srvctl status instance -d MLIBRAC1 -n mlib1n2

srvctl stop instance -d MLIBRAC1 -n mlib1n2

srvctl start instance -d MLIBRAC1 -n mlib1n2
Retrieve the current configuration srvctl config nodeapps -n <node_name> -a
srvctl configure nodeapps -n omega1 -a
Change the IP Address of a VIP srvctl modify nodeapps -n <node_name> -a <new_vip_ip_or_name>/<subnet_ip>/<interface_name>
srvctl modify nodeapps -n omega1 -a 10.1.5.3/255.255.255.0/Public
 
RAC Class Files
BASH.ORACLE PS1='[\u@\h \W]# '
export PS1

TEMP=/tmp
TMPDIR=/tmp
export TEMP TMPDIR

ORACLE_BASE=/oracle/app
export ORACLE_BASE

CRS_HOME=$ORACLE_BASE/product/11.1.0/crs_1
export CRS_HOME

ORACLE_HOME=$ORACLE_BASE/product/11.1.0/db_1
export ORACLE_HOME

PATH=$ORACLE_HOME/bin:$PATH
export PATH

ORACLE_SID=TSTxx
export ORACLE_SID

LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/jdbc/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

alias ob='cd $ORACLE_BASE'
alias oh='cd $ORACLE_HOME'
alias ch='cd $CRS_HOME'
alias crs='$CRS_HOME/bin/crs_stat'
alias sql='sqlplus "/ as sysdba"'

umask 025
BASH.ROOT PS1='[\u@\h \W]# '
export PS1

TEMP=/tmp
TMPDIR=/tmp
export TEMP TMPDIR

ORACLE_BASE=/oracle/app
export ORACLE_BASE

CRS_HOME=$ORACLE_BASE/product/11.1.0/crs_1
export CRS_HOME

ORACLE_HOME=$ORACLE_BASE/product/11.1.0/db_1
export ORACLE_HOME

PATH=$ORACLE_HOME/bin:$PATH
export PATH

ORACLE_SID=TSTxx
export ORACLE_SID

LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/jdbc/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH

alias ob='cd $ORACLE_BASE'
alias oh='cd $ORACLE_HOME'
alias ch='cd $CRS_HOME'
alias crs='$CRS_HOME/bin/crs_stat'
alias sql='sqlplus "/ as sysdba"'

umask 022
CMCFG.ORA TBD
CVUQDISK CVUQDISK_GRP=oinstall; export CVUQDISK_GRP
echo $CVUQDISK_GRP
cd /stage/clusterware/rpm
rpm -iv cvuqdisk-1.0.1-1.rpm
FSTAB # Uncomment the line only for YOUR team!

fstab
-- note for Solaris replace nolock with llock.

/* NEW 270C setup alphas
ntap270a:/vol/alpha/u0 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0
*/

/* NEW 270C setup betas
ntap270a:/vol/beta/u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0
*/

/*
ntap270b:/vol/oraapps/u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0
*/

# ALPHA - Uncomment the following
#netapp1:/vol/vol2/alpha /u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0

# BETA - Uncomment the following
#netapp1:/vol/vol2/beta /u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0

# GAMMA - Uncomment the following
#netapp1:/vol/vol2/gamma /u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0

# DELTA - Uncomment the following
#netapp1:/vol/vol2/delta /u01 nfs rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,tcp,vers=3 0 0

# Oracle documentation - do not edit
netapp1:/vol/vol0/stage.11gR1.extras/docs /oradoc nfs ro,bg,intr,hard,rsize=32768,noac,nolock,tcp,vers=3 0 0
hosts # Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost

#Public Physical Port Addresses
192.168.1.211 alpha1.mlib.org alpha1
192.168.1.212 alpha2.mlib.org alpha2
192.168.1.221 beta1.mlib.org  beta1
192.168.1.222 beta2.mlib.org  beta2
192.168.1.231 gamma1.mlib.org gamma1
192.168.1.232 gamma2.mlib.org gamma2
192.168.1.241 delta1.mlib.org delta1
192.168.1.242 delta2.mlib.org delta2

192.168.1.213 alpha3.mlib.org alpha3
192.168.1.214 alpha4.mlib.org alpha4
192.168.1.223 beta3.mlib.org  beta3
192.168.1.224 beta4.mlib.org  beta4
192.168.1.233 gamma3.mlib.org gamma3
192.168.1.234 gamma4.mlib.org gamma4
192.168.1.243 delta3.mlib.org delta3
192.168.1.244 delta4.mlib.org delta4

# Private Interconnect Addresses
10.0.1.1 node1_alpha
10.0.1.2 node2_alpha
10.0.2.1 node1_beta
10.0.2.2 node2_beta
10.0.3.1 node1_gamma
10.0.3.2 node2_gamma
10.0.4.1 node1_delta
10.0.4.2 node2_delta

10.0.1.3 node3_alpha
10.0.1.4 node4_alpha
10.0.2.3 node3_beta
10.0.2.4 node4_beta
10.0.3.3 node3_gamma
10.0.3.4 node4_gamma
10.0.4.3 node4_delta
10.0.4.4 node4_delta

# Shared Storage NetApp NAS Server
192.168.2.200 netapp1
192.168.1.200 netappweb
192.168.2.201 ntap270a
192.168.2.202 ntap270b


# Oracle VIP Addresses
192.168.1.11 vipalpha1.mlib.org vipalpha1
192.168.1.12 vipalpha2.mlib.org vipalpha2
192.168.1.21 vipbeta1.mlib.org vipbeta1
192.168.1.22 vipbeta2.mlib.org vipbeta2
192.168.1.31 vipgamma1.mlib.org vipgamma1
192.168.1.32 vipgamma2.mlib.org vipgamma2
192.168.1.41 vipdelta1.mlib.org vipdelta1
192.168.1.42 vipdelta2.mlib.org vipdelta2

192.168.1.13 vipalpha3.mlib.org vipalpha3
192.168.1.14 vipalpha4.mlib.org vipalpha4
192.168.1.23 vipbeta3.mlib.org vipbeta3
192.168.1.24 vipbeta4.mlib.org vipbeta4
192.168.1.33 vipgamma3.mlib.org vipgamma3
192.168.1.34 vipgamma4.mlib.org vipgamma4
192.168.1.43 vipdelta3.mlib.org vipdelta3
192.168.1.44 vipdelta4.mlib.org vipdelta4

# MLIB Class Support Addresses
10.0.1.1 node1
10.0.1.2 node2
192.168.1.119 bigdog.mlib.org bigdog
hosts.equiv # Comment/Uncomment those entries for your cluster below.
#alpha1       oracle
#alpha2       oracle
#beta1        oracle
#beta2        oracle
#gamma1       oracle
#gamma2       oracle
delta1        oracle
delta2        oracle

#alpha-node1  oracle
#alpha-node2  oracle
#beta-node1   oracle
#beta-node2   oracle
#gamma-node1  oracle
#gamma-node2  oracle
delta-node1   oracle
delta-node2   oracle

# DO NOT Comment out this entry!
bigdog       oracle
limits.conf # Oracle Entries

* soft nproc 2047
* hard nproc 16384
* soft nofile 1024
* hard nofile 65536
* soft memlock 3145728
* hard memlock 3145728
listener.ora TBD
login session required /lib/security/pam_limits.so
profile # For oracle install
if [ $USER = "oracle" ]; then
  if [ $SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
  else
    ulimit -u 16384 -n 65536
  fi
fi
rc.local # Oracle RAC Entries
insmod hangcheck-time hangcheck_tick=30 hangcheck_margin=180
/sbin/service nscd start
srvConfig.loc TBD
sysctl.conf # Oracle 11gR1 Parameters
kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 4194304
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 262144
TNSNAMES.ORA
11.1.0.7
RedHat Linux AS 4 U2
TSTAB_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha2.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha3.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha4.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTA_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTAB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha2.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha3.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha4.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
    )
  )

TSTA =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
    )
  )

TSTA4 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha4.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
      (INSTANCE_NAME = TSTA4)
    )
  )

TSTA3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha3.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
      (INSTANCE_NAME = TSTA3)
    )
  )

TSTA2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha2.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
      (INSTANCE_NAME = TSTA2)
    )
  )

TSTA1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipalpha1.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTA)
      (INSTANCE_NAME = TSTA1)
    )
  )

TSTB_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB)
    )
  )

TSTB4 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta4.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB)
      (INSTANCE_NAME = TSTB4)
    )
  )

TSTB3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta3.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB)
      (INSTANCE_NAME = TSTB3)
    )
)

TSTB2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta2.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB)
      (INSTANCE_NAME = TSTB2)
    )
)

TSTB1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipbeta1.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTB)
      (INSTANCE_NAME = TSTB1)
    )
  )

TSTCD_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma2.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma3.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma4.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTC_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTCD =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma2.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma3.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma4.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
    )
  )

TSTC =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
    )
 )

TSTC4 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma4.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
      (INSTANCE_NAME = TSTC4)
    )
  )

TSTC3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma3.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
      (INSTANCE_NAME = TSTC3)
    )
  )

TSTC2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma2.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
      (INSTANCE_NAME = TSTC2)
    )
  )

TSTC1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipgamma1.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTC)
      (INSTANCE_NAME = TSTC1)
    )
  )

TSTD_TAF =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD_TAF)
      (FAILOVER_MODE =
        (TYPE = SELECT)
        (METHOD = BASIC)
        (RETRIES = 180)
        (DELAY = 5)
      )
    )
  )

TSTD =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta1.mlib.org)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta2.mlib.org)(PORT = 1521))
    (LOAD_BALANCE = yes)
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD)
    )
  )

TSTD4 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta4.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD)
      (INSTANCE_NAME = TSTD4)
    )
  )

TSTD3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta3.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD)
      (INSTANCE_NAME = TSTD3)
    )
  )

TSTD2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta2.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD)
      (INSTANCE_NAME = TSTD2)
    )
  )

TSTD1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = vipdelta1.mlib.org)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = TSTD)
      (INSTANCE_NAME = TSTD1)
    )
  )
TNSNAMES.ORA
10.1.0.4
Mac OS X
ORCL4 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac4-vip)(PORT = 1521))
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
      (INSTANCE_NAME = orcl4)
    )
  )

ORCL3 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac3-vip)(PORT = 1521))
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
      (INSTANCE_NAME = orcl3)
    )
  )

ORCL2 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac2-vip)(PORT = 1521))
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
      (INSTANCE_NAME = orcl2)
    )
  )

ORCL1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac1-vip)(PORT = 1521))
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
      (INSTANCE_NAME = orcl1)
    )
  )

ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac1-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac2-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac3-vip)(PORT = 1521))
    (ADDRESS = (PROTOCOL = TCP)(HOST = oracle-rac4-vip)(PORT = 1521))
      (LOAD_BALANCE = yes)
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = orcl)
    )
  )

BAKBONE =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = bakbone-oracle)(PORT = 1521))
      (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = bakbone)
    )
  )
Verify TAF Status SELECT machine, failover_type, failover_method, failed_over, COUNT(*)
FROM gv$session
GROUP BY machine, failover_type, failover_method, failed_over;
 
Scripts
Related Scripts in /rdbms/admin
File Name Location Description
addmrpt.sql $ORACLE_HOME/rdbms/admin ADDM Report creation
addmrpti.sql $ORACLE_HOME/rdbms/admin ADDM Report creation
catawrtb.sql $ORACLE_HOME/rdbms/admin RAC Statistics data dictionary table creation
catclust.sql $ORACLE_HOME/rdbms/admin Create cluster-related views and DBMS_CLUSTDB package
catfusrg.sql $ORACLE_HOME/rdbms/admin Catalog registration of feature usage
clsrus.msb $ORACLE_HOME/racg/mesg RAC error messages
clsrus.msg $ORACLE_HOME/racg/mesg RAC error messages
cmpdbdwg.sql $ORACLE_HOME/rdbms/admin Downgrade RAC database
dbmssrv.sql $ORACLE_HOME/rdbms/admin Builds DBMS_SERVICE package
e1002000.sql $ORACLE_HOME/rdbms/admin Drop RAC related views
olse102.sql $ORACLE_HOME/rdbms/admin Downgrade for RAC enabled OLS
spcpkg.sql $ORACLE_HOME/rdbms/admin Builds STATSPACK package
sprepcon.sql $ORACLE_HOME/rdbms/admin StatsPack report configuration
sprepsins.sql $ORACLE_HOME/rdbms/admin Reports on differences between StatsPack snapshots 
sprsqins.sql $ORACLE_HOME/rdbms/admin StatsPack resource usage report
spup1101.sql $ORACLE_HOME/rdbms/admin stats$segstat_obj creation for RAC stats
utlclust.sql $ORACLE_HOME/rdbms/admin Dump cluster database related information
utlprp.sql $ORACLE_HOME/rdbms/admin Recompile invalid objects in a cluster environment
utlsyxsz.sql $ORACLE_HOME/rdbms/admin Estimates space requirement for SYSAUX tablespace
 
Time Server Creation
Time Server Setup

RedHat Linux
$ vi ntp.conf

# Prohibit general access to the ntpd service
restrict default ignore

# Permit loopback access
restrict 127.0.0.1

# --- CLIENT NETWORK ---
# Permit systems on the 192.168.1 network to use the service
# Do not use those systems as peers for synchronization
restrict 192.168.1.0 mask 255.255.255.0 notrust nomodify notrap

# --- NTP MULTICASTCLIENT ---
multicastclient
restrict 244.0.1.1 mask 255.255.255.255 notrust nomodify notrap
restrict 192.168.1.0 mask 255.255.255.0 notrust nomodify notrap

# --- GENERAL CONFIGURATION ---
server 127.127.1.0             #local clock
fudge  127.127.1.0 stratum 10

driftfile /var/lib/ntp/drift
broadcstdelay 0.008
authenticate yes
keys /etc/ntp/keys
Time Server Management

RedHat Linux
# chkconfig ntpd on

-- To start, stop, and restart NTP
# service ntpd start
# service ntpd stop
# service ntpd restart

-- To verify NTP is running
# pgrep ntpd
 
Cache Fusion Interconnect
Cache Fusion Interconnect and Jumbo Frames Verify SDU, TDU, and MTU specifically for your operating system and your switches before attempting to implement. Any MTU value greater than 9000 will lead to failure. With Linux, due to the size of the packet header do not exceed 8972 unless testing clearly establishes that a larger number (still 9000 or less) works well in stability and scalability testing.

# ping -M do -s 8972 10.0.0.4
(-M do = do not fragment)

# traceroute -U 10.0.0.4 -F 8972
(U = UDP, s = source address, F = don't fragment, v = verbose output

# nslookup oras1n2.mlib.org
<server_name>)

Note also that Oracle recommends that the cluster interconnect be done with non-routable IP addresses which means specifically (class A) 10.0.0.#, (class B) 172.16.0.#, or (class C) 192.168.0.#. No other address group should be used.
Down converts query SELECT inst_id, cr_requests, light_works, data_requests, fairness_down_converts
FROM gv$cr_block_server;
Tuning Queries -- current block transfer statistics

col "AVG RECVD TIME (ms)" format 9999999.9
col inst_id format 9999
prompt GCS CURRENT BLOCKS

SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME", ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
FROM gv$sysstat b1, gv$sysstat b2
WHERE b1.name = 'gc current block receive time'
AND b2.name = 'gc current blocks received'
AND b1.inst_id = b2.inst_id;

-- block contention can be measured by using block transfer time

col "AVG RECVD TIME (ms)" format 9999999.9
col inst_id format 9999

SELECT b1.inst_id, b2.value RECEIVED, b1.value "RECEIVE TIME", ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
FROM gv$sysstat b1, gv$sysstat b2
WHERE b1.name = 'gc cr block receive time'
AND b2.name = 'gc cr blocks received'
AND b1.inst_id = b2.inst_id;
Interconnect Time Outs and Waits SELECT inst_id, event, total_waits, total_timeouts, time_waited
FROM gv$system_event
WHERE total_timeouts > 1000000
AND event like '%gc%';
Interconnect Transfer Metrics SELECT ss1.inst_id, ss2.value "GCS CR BLOCKS RECVD", ss1.value "GCS CR BLOCK RECV TIME",
((ss1.value / ss2.value) * 10) "AVG CR BLOCK RECV TIME (ms)"
FROM gv$sysstat ss1, gv$sysstat ss2
WHERE ss1.name = 'gc cr block receive time'
AND ss2.name = 'gc cr blocks received'
AND ss1.inst_id = ss2.inst_id;

SELECT ss1.instance_number,
       ss2.value "GCS CR BLOCKS RECVD",
       ss1.value "GCS CR BLOCK RECV TIME",
     ((ss1.value / ss2.value) * 10) "AVG CR BLOCK RECV TIME (ms)"
FROM dba_hist_sysstat ss1, dba_hist_sysstat ss2
WHERE ss1.instance_number = ss2.instance_number
AND ss1.snap_id = ss2.snap_id
AND ss1.stat_name = 'gc cr block receive time'
AND ss2.stat_name = 'gc cr blocks received'
AND ss1.snap_id BETWEEN 6688 AND 6700
AND ss1.value > 0
AND ss2.value > 0
ORDER BY ss1.snap_id, instance_number;
Historic Interconnect Transfer Metrics

The following two queries are converted from the above two queries and based on historical snapshot information.
SELECT dhs.begin_interval_time, b1.instance_number, b2.value RECEIVED,
       b1.value "RECEIVE TIME", ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
FROM sys.wrh$_sysstat b1, sys.wrh$_sysstat b2, dba_hist_snapshot dhs
WHERE b1.stat_id = '1388758753'
AND b2.stat_id = '326482564'
AND b1.instance_number = 10
AND b1.instance_number = b2.instance_number
AND b1.snap_id >= 23000
AND b1.snap_id = b2.snap_id
AND b1.snap_id = dhs.snap_id
ORDER BY b1.snap_id;

SELECT dhs.begin_interval_time, b1.instance_number, b2.value RECEIVED,
       b1.value "RECEIVE TIME", ((b1.value/b2.value)*10) "AVG RECEIVE TIME (ms)"
FROM sys.wrh$_sysstat b1, sys.wrh$_sysstat b2, dba_hist_snapshot dhs
WHERE b1.stat_id = '1759426133'
AND b2.stat_id = '2877738702'
AND b1.instance_number = 10
AND b1.instance_number = b2.instance_number
AND b1.snap_id >= 23000
AND b1.snap_id = b2.snap_id
AND b1.snap_id = dhs.snap_id
ORDER BY b1.snap_id;

-- another variant: adjust for the number of nodes in the cluster
SELECT datetime,
       SUM("1_RCVD_TIME")/SUM("1_RCVD")*10 RCV_latency_1,
       SUM("2_RCVD_TIME")/SUM("2_RCVD")*10 RCV_latency_2,
       SUM("3_RCVD_TIME")/SUM("3_RCVD")*10 RCV_latency_3,
       SUM("4_RCVD_TIME")/SUM("4_RCVD")*10 RCV_latency_4,
       SUM("5_RCVD_TIME")/SUM("5_RCVD")*10 RCV_latency_5,
       SUM("6_RCVD_TIME")/SUM("6_RCVD")*10 RCV_latency_6
FROM (
  SELECT TRUNC(dhs.begin_interval_time) datetime,
         b1.instance_number,
         SUM(b2.value) RECEIVED,
         SUM(b1.value) RECEIVE_TIME
  FROM sys.wrh$_sysstat b1, sys.wrh$_sysstat b2, dba_hist_snapshot dhs, (
    SELECT MIN(snap_id) snap_id
    FROM dba_hist_snapshot
    WHERE begin_interval_time>sysdate-31) mx
  WHERE b1.stat_id = '1388758753'
  AND b2.stat_id = '326482564'
  AND b1.instance_number = b2.instance_number
  AND b1.snap_id >= mx.snap_id
  AND b1.snap_id = b2.snap_id
  AND b1.snap_id = dhs.snap_id
  GROUP BY TRUNC(begin_interval_time),
  b1.instance_number)
PIVOT(SUM(receive_time) rcvd_time, SUM(received) rcvd
FOR instance_number IN (1,2,3,4,5,6))
GROUP BY datetime
HAVING datetime < TRUNC(SYSDATE)
ORDER BY datetime DESC;
gc Lost Block Diagnostics If you have a CSI and can access the MyOracleSupport website [ Click Here ] to review Doc 563566.1
Block Transfer Waits SELECT inst_id , event , total_waits, time_waited
FROM gv$system_event
WHERE event IN ('gc buffer busy acquire', 'gc buffer busy release')
ORDER BY time_waited DESC;
Remastering

For more on remastering [Click Here]
SELECT inst_id, current_master, previous_master, COUNT(*)
FROM gv$gcspfmaster_info
WHERE current_master <> previous_master
GROUP BY inst_id, current_master, previous_master
ORDER BY 1,2,3;

   INST_ID CURRENT_MASTER PREVIOUS_MASTER   COUNT(*)
---------- -------------- --------------- ----------
         5              4               5          7
         5              4               6         60
         5              4           32767       2350
         5              5               4         47
         5              5               6         24
         5              5           32767        171
         5              6               4        196
         5              6               5         15
         5              6           32767       1430
         6              4               5          7
         6              4               6         60
         6              4           32767       2350
         6              5               4         47
         6              5               6         24
         6              5           32767        171
         6              6               4        196
         6              6               5         15
         6              6           32767       1430
         7              4               5          7
         7              4               6         60
         7              4           32767       2350
         7              5               4         47
         7              5               6         24
         7              5           32767        171
         7              6               4        196
         7              6               5         15
         7              6           32767       1430

SELECT current_master, previous_master, COUNT(*)
FROM v$gcspfmaster_info
WHERE current_master <> previous_master
GROUP BY current_master, previous_master
ORDER BY 1,2;

CURRENT_MASTER PREVIOUS_MASTER   COUNT(*)
-------------- --------------- ----------
             4               5          7
             4               6         60
             4           32767       2355
             5               4         46
             5               6         24
             5           32767        171
             6               4        196
             6               5         15
             6           32767       1430

SELECT *
FROM v$gcspfmaster_info
WHERE current_master <> previous_master;

-- sample output: the original was 4304 rows
 FILE_ID    OBJECT_ID       TYPE CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
---------- ---------- ---------- -------------- --------------- ------------
         0 4294950913          0              5           32767            1
         0 4294950914          0              5           32767            1
         0 4294950915          0              4           32767            0
         0   27115525          0              4           32767            1
         0   27541513          0              6               4            2
         0   26492941          0              6           32767            0
         0   16080980          0              6               4            2
         0   27115599          0              4               6            3
         0   27115611          0              6               4            4
         0   27115623          0              4               6            4
         0   27115643          0              6               5            5
         0   27115677          0              6               5            6
         0   26935899          1              4           32767            1
         0   27116474          0              6               5            7
         0   27116765          0              4               6            1
         0   27116794          0              6               5            2
         0   24669848          0              6               4            5
         0   24669852          0              6               4            4
         0   27184827          0              6               4            3
         0   21860252          0              6               4            3
         0   27185414          0              6               4            4
         0   27185527          0              4               5            3
         0   26489485          0              6               4            4
         0   25965349          0              6               4            3
         0   24319918          0              6               4            4
         0   24319920          0              6               4            4
         0   22861745          0              4           32767            0
         0   27113425          0              6               4            5
         0   27113630          0              4               6            3
         0   27187706          0              6               4            3
         0   27187717          0              6               4            3
         0   27187732          0              6               4            3
         0   27114148          0              6               4            4
         0   27114150          0              4               6            3
         0   27114328          0              4               6            3
         0   27188112          0              6               5            3
         0   27114387          0              4               6            3
         0   26049468          0              6               4            3
         0   27114434          0              4               6            5
         0   27114436          0              4               6            5
         0   27114468          0              4               6            4
         0   27114470          0              6               4            5
         0   27114532          0              6               4            3
         0   24575026          0              4               6            5
         0   24575030          0              6               4            4
         0   24575031          0              6               4            5
         0   27114592          0              6               4            4
         0   27115395          0              4               6            3
 
Miscellaneous
ALTER SYSTEM CHECK DATAFILES Updates one or all instance SGAs from a database control file to reflect information on all online data files. See ALTER SYSM link below.
Log sync events In a RAC environment, cache fusion transfers sometimes causes the log file sync event. This is due to the write ahead logging mechanism. When a foreground process requests a block from another instance, all the redo entries associated with the block must be flushed to disk prior to transferring the block. In this case, there is no commit involved, but buffer cahe synchronization and waits on the log file sync event.
Max commit propagation delay Max Commit Propagation Delay (MCPD) tells which algorithm to choose to propagate commits in a RAC environment. When you set it to default (10.1 and earlier) 700 cs, it uses lamport algorithm for SCN propagation. Setting this to lower values enables the Broadcast on Commit (BOC) which is little bit more expensive on messages, but commits are visible immediately to the other nodes.

In current versions (10.2 and above) the BOC algorithms are optimized and is the default SCN propagation mechanism.
TAF Restart

Comments from Kuassi Mensah
TAF will not always successfully restart a query in progress. It reopens the cursors and attempts to discard rows already returned. In order to achieve that, it performs a chekcsum of to-be-discarded rows and compares that checksum against a checksum for the rows already returned. If the checksums are different, TAF knows the discarded rows are not the same as the rows already returned. In such a case, it will not resume returning rows and returns an error. Checksum discrepancies are more likely to happen with replica databases which are not block-for-block identical.

There are some queries that will not survive failover. OCI implicitly replays the query on the surviving instance using the original SCN and discards the rows that were previously returned. If, after failover, the same set of rows is not returned, then we throw an error indicating that the query must be re-executed. With use of the original SCN *and* the checksum verification on the re-fetched rows, it is unlikely that OCI will erroneously re-return rows that had been previously seen.

Prepared Statements and their cursor are re-openned/executed. However If the application uses any stored procedures (i.e., CallableStatement), then the state of those procedures is lost after failover.

The surviving PreparedStatement assumes that the Statement handle's bind values have not changed but this unlikey as different bind may return a different result set. At any rate, OCI transparently replays the execute-SQL/fetch during TAF assuming same set of rows is returned from server.

If there is a failure you will most likely see an ORA-25402 error. Applications should be prepared to handle errors in the 25400-25425 range and rollback appropriately.
Starting and stopping an 11gR2 RAC Cluster Stopping the database

1) Login into database instance owner and stop the database instance
$ORACLE_HOME/bin/srvctl stop database -d <dbname>

2) As root user, stop the cluster
$CRS_ORACLE_HOME/bin/crsctl stop crs

Start the instance
1) As root user, start the cluster
$CRS_ORACLE_HOME/bin/crsctl start crs

1) Login into database instance owner and start the database instance
$ORACLE_HOME/bin/srvctl start database -d <dbname>
Queries to determine if  connections are using TAF SELECT DISTINCT resource_consumer_group, failover_type, failover_method, failed_over
FROM gv$session;

SELECT machine, failover_type, failover_method, failed_over, COUNT(*)
FROM gv$session
GROUP BY machine, failover_type, failover_method, failed_over;
Modify Interconnect UDP High Water Mark to prevent buffer overflow. If done do it on all nodes. This syntax from Solaris 10 U6 # ndd -set /dev/udp udp_xmit_hiwat 1048576

# ndd -set /dev/udp udp_recv_hiwat 1048576
 
 
Morgan's Library Page Footer
This site is maintained by Dan Morgan. Last Updated: This site is protected by copyright and trademark laws under U.S. and International law. © 1998-2014 Daniel A. Morgan All Rights Reserved