Tuesday, January 26, 2021

Cluvfy Usage

 

1. Download  location for 12c cluvfy

http://www.oracle.com/technetwork/database/options/clustering/downloads/index.html

  • Cluster Verification Utility Download for Oracle Grid Infrastructure 12c
  • Always download the newest cluvfy version from above linke
  • The latest CVU version (July 2013) can be used with all currently supported Oracle RAC versions, including Oracle RAC 10g, Oracle RAC 11g  and Oracle RAC 12c.

Impact of latest Cluvfy version

It's nothing more annoying than debugging a RAC problem which is finally a Cluvfy BUG. 
The latest Download from January 2015 shows the following version  
  [grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version
  12.1.0.1.0 Build 112713x8664
whereas my current 12.1 installation reports the following version  
  [grid@gract1 ~/CLUVFY-JAN-2015]$ cluvfy -version
  12.1.0.1.0 Build 100213x866

Cluvfy trace Location

If you have installed cluvfy in  /home/grid/CLUVFY-JAN-2015 the related cluvfy traces could be found
in cv/log subdirectory

[root@gract1 CLUVFY-JAN-2015]# ls /home/grid/CLUVFY-JAN-2015/cv/log
cvutrace.log.0  cvutrace.log.0.lck

Note some cluvfy commands like :
# cluvfy comp dhcp -clustername gract -verbose
must be run as root ! Im that case the default trace location may not have the correct permissions .
In that uses the script below to set Trace Level and Trace Location

Setting Cluvfy trace File Locaton and Trace Level in a bash script

The following bash script sets the cluvfy trace location and the cluvfy trace level 
#!/bin/bash
rm -rf /tmp/cvutrace
mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=2

Why  cluvfy version matters ?

Yesterday I debugged a DHCP problem starting with cluvfy :
[grid@gract1 ~]$  cluvfy -version
12.1.0.1.0 Build 100213x8664
[root@gract1 network-scripts]# cluvfy comp dhcp -clustername gract -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
<null>
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
<null>
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
<null>
..
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was unsuccessful on all the specified nodes. 
 
--> As verification was unsuccessful I started Network Tracing using tcpdump. 
    But Network tracing looks good and I get a bad feeling about cluvfy ! 

What to do next ?
Install the newest cluvfy version and rerun the test !
[grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy -version
12.1.0.1.0 Build 112713x8664

Now rerun test :
[root@gract1 CLUVFY-JAN-2015]#  bin/cluvfy  comp dhcp -clustername gract -verbose

Verifying DHCP Check 
Checking if any DHCP server exists on the network...
DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
DHCP server returned server: 192.168.5.50, loan address: 192.168.5.150/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
..
released DHCP server lease for client ID "gract-gract1-vip" on port "67"
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was successful. 

Why you should always review your cluvfy logs ?

Per default cluvfy logs are under CV_HOME/cv/logs

[grid@gract1 ~/CLUVFY-JAN-2015]$  cluvfy  stage -pre crsinst -n gract1 
Performing pre-checks for cluster services setup 
Checking node reachability...
Node reachability check passed from node "gract1"
Checking user equivalence...
User equivalence check passed for user "grid"
ERROR: 
An error occurred in creating a TaskFactory object or in generating a task list
PRCT-1011 : Failed to run "oifcfg". Detailed error: []
PRCT-1011 : Failed to run "oifcfg". Detailed error: []
This error is not very helpful at all !

Reviewing cluvfy logfiles for details:
[root@gract1 log]#  cd $GRID_HOME/cv/log

Cluvfy log cvutrace.log.0 : 
[Thread-49] [ 2015-01-22 08:51:25.283 CET ] [StreamReader.run:65]  OUTPUT>PRIF-10: failed to initialize the cluster registry
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:144]  runCommand: process returns 1
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:161]  RunTimeExec: output>
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:164]  PRIF-10: failed to initialize the cluster registry
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:170]  RunTimeExec: error>
[main] [ 2015-01-22 08:51:25.286 CET ] [RuntimeExec.runCommand:192]  Returning from RunTimeExec.runCommand
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:884]  retval =  1
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:885]  exitval =  1
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:886]  rtErrLength =  0
[main] [ 2015-01-22 08:51:25.286 CET ] [CmdToolUtil.doexecuteLocally:892]  Failed to execute command. Command = [/u01/app/121/grid/bin/oifcfg, getif, -from, gpnp] env = null error = []
[main] [ 2015-01-22 08:51:25.287 CET ] [ClusterNetworkInfo.getNetworkInfoFromOifcfg:152]  INSTALLEXCEPTION: occured while getting cluster network info. messagePRCT-1011 : Failed to run "oifcfg". Detailed error: []
[main] [ 2015-01-22 08:51:25.287 CET ] [TaskFactory.getNetIfFromOifcfg:4352]  Exception occured while getting network information. msg=PRCT-1011 : Failed to run "oifcfg". Detailed error: []

Here we get a better error message : PRIF-10: failed to initialize the cluster registry
and we extract the failing command : /u01/app/121/grid/bin/oifcfg getif

Now we can retry the OS command as OS level
[grid@gract1 ~/CLUVFY-JAN-2015]$ /u01/app/121/grid/bin/oifcfg getif
PRIF-10: failed to initialize the cluster registry

Btw, if you have uploaded the new cluvfy command you get a much better error output 
[grid@gract1 ~/CLUVFY-JAN-2015]$ bin/cluvfy  stage -pre crsinst -n gract1
ERROR: 
PRVG-1060 : Failed to retrieve the network interface classification information from an existing CRS home at path "/u01/app/121/grid" on the local node
PRCT-1011 : Failed to run "oifcfg". Detailed error: PRIF-10: failed to initialize the cluster registry

For Fixing PRVG-1060,PRCT-1011,PRIF-10 runnung above cluvfy commnads please read 
following article: Common cluvfy errors and warnings

Run cluvfy before CRS installation by passing network connections for PUBLIC and CLUSTER_INTERCONNECT

$ ./bin/cluvfy stage -pre crsinst -n grac121,grac122  -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect

Run cluvfy before doing an UPGRADE

grid@grac41 /]$  cluvfy stage -pre crsinst -upgrade -n grac41,grac42,grac43 -rolling -src_crshome $GRID_HOME 
                 -dest_crshome /u01/app/grid_new -dest_version 12.1.0.1.0  -fixup -fixupdir /tmp -verbose

 Run cluvfy 12.1 for preparing a 10.2.1.0 CRS installation

Always install newest cluvfy version even for 10gR2 CRS validations!
[root@ract1 ~]$  ./bin/cluvfy  -version
12.1.0.1.0 Build 112713x8664

Verify OS setup on ract1
[root@ract1 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup
--> Run required scripts
[root@ract1 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh
All Fix-up operations were completed successfully.

Repeat this step on ract2
[root@ract2 ~]$ ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract2 -verbose -fixup
--> Run required scripts
[root@ract2 ~]# /tmp/CVU_12.1.0.1.0_oracle/runfixup.sh
All Fix-up operations were completed successfully.

Now verify System requirements on both nodes
[oracle@ract1 cluvfy12]$  ./bin/cluvfy comp sys -p crs -r 10gR2 -n ract1 -verbose -fixup
Verifying system requirement
..
NOTE:
No fixable verification failures to fix

Finally run cluvfy to test CRS installation readiness 
$ cluvfy12/bin/cluvfy stage -pre crsinst -r 10gR2 \
  -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect \
  -n ract1,ract2 -verbose
..
Pre-check for cluster services setup was successful.

Run cluvfy comp software to check file protections for GRID and RDBMS installations

  • Note : Not all files are checked ( SHELL scripts like ohasd are missing )  –    Bug 18407533 – CLUVFY DOES NOT VERIFY ALL FILES
  • Config File  : $GRID_HOME/cv/cvdata/ora_software_cfg.xml
Run   cluvfy comp software to verify GRID stack 
[grid@grac41 ~]$  cluvfy comp software  -r  11gR2 -n grac41 -verbose  
Verifying software 
Check: Software
  1178 files verified                 
Software check passed
Verification of software was successful. 

Run   cluvfy comp software to verify RDBMS stack 
[oracle@grac43 ~]$  cluvfy comp software  -d $ORACLE_HOME -r 11gR2 -verbose 
Verifying software 
Check: Software
  1780 files verified                 
Software check passed
Verification of software was successful.

Run cluvfy before CRS installation on a single node and create a  script for fixable errors

$ ./bin/cluvfy comp sys -p crs -n grac121 -verbose -fixup
Verifying system requirement 
Check: Total memory 
  Node Name     Available                 Required                  Status    
  ------------  ------------------------  ------------------------  ----------
  grac121       3.7426GB (3924412.0KB)    4GB (4194304.0KB)         failed    
Result: Total memory check failed
... 
*****************************************************************************************
Following is the list of fixable prerequisites selected to fix in this session
******************************************************************************************
--------------                ---------------     ----------------    
Check failed.                 Failed on nodes     Reboot required?    
--------------                ---------------     ----------------    
Hard Limit: maximum open      grac121             no                  
file descriptors                                                      
Execute "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" as root user on nodes "grac121" to perform the fix up operations manually
--> Now run runfixup.sh" as root   on nodes "grac121" 
Press ENTER key to continue after execution of "/tmp/CVU_12.1.0.1.0_grid/runfixup.sh" has completed on nodes "grac121"
Fix: Hard Limit: maximum open file descriptors 
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac121                               successful              
Result: "Hard Limit: maximum open file descriptors" was successfully fixed on all the applicable nodes
Fix up operations were successfully completed on all the applicable nodes
Verification of system requirement was unsuccessful on all the specified nodes.

Note errrors like to low memory/swap needs manual intervention:
Check: Total memory 
  Node Name     Available                 Required                  Status    
  ------------  ------------------------  ------------------------  ----------
  grac121       3.7426GB (3924412.0KB)    4GB (4194304.0KB)         failed    
Result: Total memory check failed
Fix that error at OS level and rerun the above cluvfy command

Performing post-checks for hardware and operating system setup

  • cluvfy  stage -post hwos  test multicast communication with multicast group “230.0.1.0”
[grid@grac42 ~]$  cluvfy stage -post hwos -n grac42,grac43 -verbose 
Performing post-checks for hardware and operating system setup 
Checking node reachability...
Check: Node reachability from node "grac42"
  Destination Node                      Reachable?              
  ------------------------------------  ------------------------
  grac42                                yes                     
  grac43                                yes                     
Result: Node reachability check passed from node "grac42"

Checking user equivalence...
Check: User equivalence for user "grid"
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
Result: User equivalence check passed for user "grid"

Checking node connectivity...
Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
Verification of the hosts config file successful

Interface information for node "grac43"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:38:10:76 1500  
 eth1   192.168.1.103   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.59    192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.170   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth1   192.168.1.177   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:F6:18:43 1500  
 eth2   192.168.2.103   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:1C:30:DD 1500  
 eth2   169.254.125.13  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:1C:30:DD 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Interface information for node "grac42"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:6C:89:27 1500  
 eth1   192.168.1.102   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.165   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.178   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.167   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth2   192.168.2.102   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 eth2   169.254.96.101  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Check: Node connectivity for interface "eth1"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac43[192.168.1.103]           grac43[192.168.1.59]            yes             
  grac43[192.168.1.103]           grac43[192.168.1.170]           yes             
  ..     
  grac42[192.168.1.165]           grac42[192.168.1.167]           yes             
  grac42[192.168.1.178]           grac42[192.168.1.167]           yes             
Result: Node connectivity passed for interface "eth1"

Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.1.102            grac43:192.168.1.103            passed          
  grac42:192.168.1.102            grac43:192.168.1.59             passed          
  grac42:192.168.1.102            grac43:192.168.1.170            passed          
  grac42:192.168.1.102            grac43:192.168.1.177            passed          
  grac42:192.168.1.102            grac42:192.168.1.165            passed          
  grac42:192.168.1.102            grac42:192.168.1.178            passed          
  grac42:192.168.1.102            grac42:192.168.1.167            passed          
Result: TCP connectivity check passed for subnet "192.168.1.0"

Check: Node connectivity for interface "eth2"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac43[192.168.2.103]           grac42[192.168.2.102]           yes             
Result: Node connectivity passed for interface "eth2"
Check: TCP connectivity of subnet "192.168.2.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.2.102            grac43:192.168.2.103            passed          
Result: TCP connectivity check passed for subnet "192.168.2.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed for subnet "192.168.2.0".
Subnet mask consistency check passed.
Result: Node connectivity check passed

Checking multicast communication...
Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.

Checking for multiple users with UID value 0
Result: Check for multiple users with UID value 0 passed 
Check: Time zone consistency 
Result: Time zone consistency check passed

Checking shared storage accessibility...
  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdb                              grac43                  
  /dev/sdk                              grac42                  
..        
  Disk                                  Sharing Nodes (2 in count)
  ------------------------------------  ------------------------
  /dev/sdp                              grac43 grac42           
Shared storage check was successful on nodes "grac43,grac42"

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes...
Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined
More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file
All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf"
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

Post-check for hardware and operating system setup was successful. 

Debugging Voting disk problems with:  cluvfy comp vdisk

As your CRS stack may not be up run these commands from a node which is up and running 
[grid@grac42 ~]$ cluvfy comp ocr -n grac41
Verifying OCR integrity 
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR: 
PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed.
OCR integrity check failed
Verification of OCR integrity was unsuccessful on all the specified nodes. 

[grid@grac42 ~]$ cluvfy comp vdisk -n grac41
Verifying Voting Disk: 
Checking Oracle Cluster Voting Disk configuration...
ERROR: 
PRVF-4194 : Asm is not running on any of the nodes. Verification cannot proceed.
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdf1"
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdg1"
ERROR: 
PRVF-5157 : Could not verify ASM group "OCR" for Voting Disk location "/dev/asmdisk1_udev_sdh1"
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
UDev attributes check for Voting Disk locations started...
UDev attributes check passed for Voting Disk locations 
Verification of Voting Disk was unsuccessful on all the specified nodes. 

Debugging steps at OS level 
Verify disk protections and use kfed to read disk header 
[grid@grac41 ~/cluvfy]$ ls -l /dev/asmdisk1_udev_sdf1 /dev/asmdisk1_udev_sdg1 /dev/asmdisk1_udev_sdh1
b---------. 1 grid asmadmin 8,  81 May 14 09:51 /dev/asmdisk1_udev_sdf1
b---------. 1 grid asmadmin 8,  97 May 14 09:51 /dev/asmdisk1_udev_sdg1
b---------. 1 grid asmadmin 8, 113 May 14 09:51 /dev/asmdisk1_udev_sdh1

[grid@grac41 ~/cluvfy]$ kfed read  /dev/asmdisk1_udev_sdf1
KFED-00303: unable to open file '/dev/asmdisk1_udev_sdf1'

Debugging file protection problems with:  cluvfy comp software

  • Related BUG: 18350484 : 112042GIPSU:”CLUVFY COMP SOFTWARE” FAILED IN 112042GIPSU IN HPUX
Investigate file protection problems with cluvfy comp software

Cluvfy checks file protections against ora_software_cfg.xml
[grid@grac41 cvdata]$ cd  /u01/app/11204/grid/cv/cvdata
[grid@grac41 cvdata]$ grep gpnp ora_software_cfg.xml
      <File Path="bin/" Name="gpnpd.bin" Permissions="0755"/>
      <File Path="bin/" Name="gpnptool.bin" Permissions="0755"/>

Change protections and verify wiht cluvfy
[grid@grac41 cvdata]$ chmod 444  /u01/app/11204/grid/bin/gpnpd.bin
[grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd
    /u01/app/11204/grid/bin/gpnpd.bin..."Permissions" did not match reference
        Permissions of file "/u01/app/11204/grid/bin/gpnpd.bin" did not match the expected value. [Expected = "0755" ; Found = "0444"]

Now correct problem and verify again 
[grid@grac41 cvdata]$ chmod 755  /u01/app/11204/grid/bin/gpnpd.bin
[grid@grac41 cvdata]$ cluvfy comp software -verbose | grep gpnpd
--> No errors were reported anymore

Debugging CTSSD/NTP problems with:  cluvfy comp clocksync

[grid@grac41 ctssd]$ cluvfy comp clocksync -n grac41,grac42,grac43 -verbose
Verifying Clock Synchronization across the cluster nodes 
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac43                                passed                  
  grac42                                passed                  
  grac41                                passed                  
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
  Node Name                             State                   
  ------------------------------------  ------------------------
  grac43                                Observer                
  grac42                                Observer                
  grac41                                Observer                
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
  Node Name                             Running?                
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: 
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
Result: 
NTP daemon's boot time configuration check for slewing option passed
Checking whether NTP daemon or service is using UDP port 123 on all nodes
Check for NTP daemon or service using UDP port 123
  Node Name                             Port Open?              
  ------------------------------------  ------------------------
  grac43                                yes                     
  grac42                                yes                     
  grac41                                yes                     
NTP common Time Server Check started...
NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[grac43, grac42, grac41]"... 
Check: Clock time offset from NTP Time Server
Time Server: .LOCL. 
Time Offset Limit: 1000.0 msecs
  Node Name     Time Offset               Status                  
  ------------  ------------------------  ------------------------
  grac43        0.0                       passed                  
  grac42        0.0                       passed                  
  grac41        0.0                       passed                  
Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac43, grac42, grac41]". 
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful. 

At OS level you can run ntpq -p 
[root@grac41 dev]# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*ns1.example.com LOCAL(0)        10 u   90  256  377    0.072  -238.49 205.610
 LOCAL(0)        .LOCL.          12 l  15h   64    0    0.000    0.000   0.000

Running cluvfy stage -post crsinst   after a failed Clusterware startup

  • Note you should run cluvfy from a ndoe which is up and runnung to get best results
 
CRS resource status
[grid@grac41 ~]$ my_crs_stat_init
NAME                           TARGET     STATE           SERVER       STATE_DETAILS   
-------------------------      ---------- ----------      ------------ ------------------
ora.asm                        ONLINE     OFFLINE                      Instance Shutdown
ora.cluster_interconnect.haip  ONLINE     OFFLINE                       
ora.crf                        ONLINE     ONLINE          grac41        
ora.crsd                       ONLINE     OFFLINE                       
ora.cssd                       ONLINE     OFFLINE         STARTING      
ora.cssdmonitor                ONLINE     ONLINE          grac41        
ora.ctssd                      ONLINE     OFFLINE                       
ora.diskmon                    OFFLINE    OFFLINE                       
ora.drivers.acfs               ONLINE     OFFLINE                       
ora.evmd                       ONLINE     OFFLINE                       
ora.gipcd                      ONLINE     ONLINE          grac41        
ora.gpnpd                      ONLINE     ONLINE          grac41        
ora.mdnsd                      ONLINE     ONLINE          grac41        

Verify CRS status with cluvfy ( CRS on grac42 is up and running )
[grid@grac42 ~]$ cluvfy stage -post crsinst -n grac41,grac42 -verbose
Performing post-checks for cluster services setup 
Checking node reachability...
Check: Node reachability from node "grac42"
  Destination Node                      Reachable?              
  ------------------------------------  ------------------------
  grac42                                yes                     
  grac41                                yes                     
Result: Node reachability check passed from node "grac42"

Checking user equivalence...
Check: User equivalence for user "grid"
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                passed                  
Result: User equivalence check passed for user "grid"

Checking node connectivity...
Checking hosts config file...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                passed                  
Verification of the hosts config file successful

Interface information for node "grac42"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:6C:89:27 1500  
 eth1   192.168.1.102   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.59    192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.178   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth1   192.168.1.170   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:63:08:07 1500  
 eth2   192.168.2.102   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 eth2   169.254.96.101  169.254.0.0     0.0.0.0         10.0.2.2        08:00:27:DF:79:B9 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  
Interface information for node "grac41"
 Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
 ------ --------------- --------------- --------------- --------------- ----------------- ------
 eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:82:47:3F 1500  
 eth1   192.168.1.101   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:89:E9:A2 1500  
 eth2   192.168.2.101   192.168.2.0     0.0.0.0         10.0.2.2        08:00:27:6B:E2:BD 1500  
 virbr0 192.168.122.1   192.168.122.0   0.0.0.0         10.0.2.2        52:54:00:ED:19:7C 1500  

Check: Node connectivity for interface "eth1"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42[192.168.1.102]           grac42[192.168.1.59]            yes             
  grac42[192.168.1.102]           grac42[192.168.1.178]           yes             
  grac42[192.168.1.102]           grac42[192.168.1.170]           yes             
  grac42[192.168.1.102]           grac41[192.168.1.101]           yes             
  grac42[192.168.1.59]            grac42[192.168.1.178]           yes             
  grac42[192.168.1.59]            grac42[192.168.1.170]           yes             
  grac42[192.168.1.59]            grac41[192.168.1.101]           yes             
  grac42[192.168.1.178]           grac42[192.168.1.170]           yes             
  grac42[192.168.1.178]           grac41[192.168.1.101]           yes             
  grac42[192.168.1.170]           grac41[192.168.1.101]           yes             
Result: Node connectivity passed for interface "eth1"

Check: TCP connectivity of subnet "192.168.1.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.1.102            grac42:192.168.1.59             passed          
  grac42:192.168.1.102            grac42:192.168.1.178            passed          
  grac42:192.168.1.102            grac42:192.168.1.170            passed          
  grac42:192.168.1.102            grac41:192.168.1.101            passed          
Result: TCP connectivity check passed for subnet "192.168.1.0"

Check: Node connectivity for interface "eth2"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42[192.168.2.102]           grac41[192.168.2.101]           yes             
Result: Node connectivity passed for interface "eth2"

Check: TCP connectivity of subnet "192.168.2.0"
  Source                          Destination                     Connected?      
  ------------------------------  ------------------------------  ----------------
  grac42:192.168.2.102            grac41:192.168.2.101            passed          
Result: TCP connectivity check passed for subnet "192.168.2.0"

Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "192.168.1.0".
Subnet mask consistency check passed for subnet "192.168.2.0".
Subnet mask consistency check passed.
Result: Node connectivity check passed

Checking multicast communication...
Checking subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.1.0" for multicast communication with multicast group "230.0.1.0" passed.
Checking subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0"...
Check of subnet "192.168.2.0" for multicast communication with multicast group "230.0.1.0" passed.
Check of multicast communication passed.

Check: Time zone consistency 
Result: Time zone consistency check passed

Checking Oracle Cluster Voting Disk configuration...
ERROR: 
PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
--> Expected error as lower CRS stack is not completly up and running
grac41
Oracle Cluster Voting Disk configuration check passed

Checking Cluster manager integrity... 
Checking CSS daemon...
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                running                 
  grac41                                not running             
ERROR: 
PRVF-5319 : Oracle Cluster Synchronization Services do not appear to be online.
Cluster manager integrity check failed
--> Expected error as lower CRS stack is not completely up and running

UDev attributes check for OCR locations started...
Result: UDev attributes check passed for OCR locations 
UDev attributes check for Voting Disk locations started...
Result: UDev attributes check passed for Voting Disk locations 

Check default user file creation mask
  Node Name     Available                 Required                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        22                        0022                      passed    
  grac41        22                        0022                      passed    
Result: Default user file creation mask check passed

Checking cluster integrity...
  Node Name                           
  ------------------------------------
  grac41                              
  grac42                              
  grac43                              

Cluster integrity check failed This check did not run on the following node(s): 
    grac41

Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR: 
PRVF-4193 : Asm is not running on the following nodes. Proceeding with the remaining nodes.
    grac41
--> Expected error as lower CRS stack is not completely up and running

Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful
ERROR: 
PRVF-4195 : Disk group for ocr location "+OCR" not available on the following nodes:
    grac41
--> Expected error as lower CRS stack is not completly up and running
NOTE: 
This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check failed

Checking CRS integrity...

Clusterware version consistency passed
The Oracle Clusterware is healthy on node "grac42"
ERROR: 
PRVF-5305 : The Oracle Clusterware is not healthy on node "grac41"
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
CRS integrity check failed
--> Expected error as lower CRS stack is not completly up and running

Checking node application existence...
Checking existence of VIP node application (required)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        yes                       yes                       passed    
  grac41        yes                       no                        exists    
VIP node application is offline on nodes "grac41"

Checking existence of NETWORK node application (required)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        yes                       yes                       passed    
  grac41        yes                       no                        failed    
PRVF-4570 : Failed to check existence of NETWORK node application on nodes "grac41"
--> Expected error as lower CRS stack is not completly up and running

Checking existence of GSD node application (optional)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        no                        no                        exists    
  grac41        no                        no                        exists    
GSD node application is offline on nodes "grac42,grac41"

Checking existence of ONS node application (optional)
  Node Name     Required                  Running?                  Comment   
  ------------  ------------------------  ------------------------  ----------
  grac42        no                        yes                       passed    
  grac41        no                        no                        failed    
PRVF-4576 : Failed to check existence of ONS node application on nodes "grac41"
--> Expected error as lower CRS stack is not completly up and running

Checking Single Client Access Name (SCAN)...
  SCAN Name         Node          Running?      ListenerName  Port          Running?    
  ----------------  ------------  ------------  ------------  ------------  ------------
  grac4-scan.grid4.example.com  grac43        true          LISTENER_SCAN1  1521          true        
  grac4-scan.grid4.example.com  grac42        true          LISTENER_SCAN2  1521          true        

Checking TCP connectivity to SCAN Listeners...
  Node          ListenerName              TCP connectivity?       
  ------------  ------------------------  ------------------------
  grac42        LISTENER_SCAN1            yes                     
  grac42        LISTENER_SCAN2            yes                     
TCP connectivity to SCAN Listeners exists on all cluster nodes

Checking name resolution setup for "grac4-scan.grid4.example.com"...

Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
Checking if "hosts" entry in file "/etc/nsswitch.conf" is consistent across nodes...
Checking file "/etc/nsswitch.conf" to make sure that only one "hosts" entry is defined
More than one "hosts" entry does not exist in any "/etc/nsswitch.conf" file
All nodes have same "hosts" entry defined in file "/etc/nsswitch.conf"
Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed

  SCAN Name     IP Address                Status                    Comment   
  ------------  ------------------------  ------------------------  ----------
  grac4-scan.grid4.example.com  192.168.1.165             passed                              
  grac4-scan.grid4.example.com  192.168.1.168             passed                              
  grac4-scan.grid4.example.com  192.168.1.170             passed                              

Verification of SCAN VIP and Listener setup passed

Checking OLR integrity...
Checking OLR config file...
ERROR: 
PRVF-4184 : OLR config file check failed on the following nodes:
    grac41
    grac41:Group of file "/etc/oracle/olr.loc" did not match the expected value. [Expected = "oinstall" ; Found = "root"]
Fix : 
[grid@grac41 ~]$ ls -l /etc/oracle/olr.loc
-rw-r--r--. 1 root root 81 May 11 14:02 /etc/oracle/olr.loc
root@grac41 Desktop]#  chown root:oinstall  /etc/oracle/olr.loc

Checking OLR file attributes...
OLR file check successful
OLR integrity check failed

Checking GNS integrity...
Checking if the GNS subdomain name is valid...
The GNS subdomain name "grid4.example.com" is a valid domain name
Checking if the GNS VIP belongs to same subnet as the public network...
Public network subnets "192.168.1.0" match with the GNS VIP "192.168.1.0"
Checking if the GNS VIP is a valid address...
GNS VIP "192.168.1.59" resolves to a valid IP address
Checking the status of GNS VIP...
Checking if FDQN names for domain "grid4.example.com" are reachable
PRVF-5216 : The following GNS resolved IP addresses for "grac4-scan.grid4.example.com" are not reachable: "192.168.1.168"
PRKN-1035 : Host "192.168.1.168" is unreachable
-->
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
GNS resolved IP addresses are reachable
Checking status of GNS resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  grac42        yes                       yes                     
  grac41        no                        yes                     
GNS resource configuration check passed
Checking status of GNS VIP resource...
  Node          Running?                  Enabled?                
  ------------  ------------------------  ------------------------
  grac42        yes                       yes                     
  grac41        no                        yes                     

GNS VIP resource configuration check passed.

GNS integrity check passed
OCR detected on ASM. Running ACFS Integrity checks...

Starting check to see if ASM is running on all cluster nodes...
PRVF-5110 : ASM is not running on nodes: "grac41," 
--> Expected error as lower CRS stack is not completly up and running

Starting Disk Groups check to see if at least one Disk Group configured...
Disk Group Check passed. At least one Disk Group configured

Task ACFS Integrity check failed

Checking to make sure user "grid" is not in "root" group
  Node Name     Status                    Comment                 
  ------------  ------------------------  ------------------------
  grac42        passed                    does not exist          
  grac41        passed                    does not exist          
Result: User "grid" is not part of "root" group. Check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
  Node Name                             Status                  
  ------------------------------------  ------------------------
  grac42                                passed                  
  grac41                                failed                  
PRVF-9671 : CTSS on node "grac41" is not in ONLINE state, when checked with command "/u01/app/11204/grid/bin/crsctl stat resource ora.ctssd -init" 
--> Expected error as lower CRS stack is not completly up and running
Result: Check of CTSS resource passed on all nodes

Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed

Check CTSS state started...
Check: CTSS state
  Node Name                             State                   
  ------------------------------------  ------------------------
  grac42                                Observer                
CTSS is in Observer state. Switching over to clock synchronization checks using NTP

Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
  Node Name                             Running?                
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: 
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
  Node Name                             Slewing Option Set?     
  ------------------------------------  ------------------------
  grac42                                yes                     
Result: 
NTP daemon's boot time configuration check for slewing option passed
Checking whether NTP daemon or service is using UDP port 123 on all nodes
Check for NTP daemon or service using UDP port 123
  Node Name                             Port Open?              
  ------------------------------------  ------------------------
  grac42                                yes                     
NTP common Time Server Check started...
NTP Time Server ".LOCL." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[grac42]"... 
Check: Clock time offset from NTP Time Server
Time Server: .LOCL. 
Time Offset Limit: 1000.0 msecs
  Node Name     Time Offset               Status                  
  ------------  ------------------------  ------------------------
  grac42        0.0                       passed                  
Time Server ".LOCL." has time offsets that are within permissible limits for nodes "[grac42]". 
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
PRVF-9652 : Cluster Time Synchronization Services check failed
--> Expected error as lower CRS stack is not completly up and running

Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Post-check for cluster services setup was unsuccessful. 
Checks did not pass for the following node(s):
    grac41

Verify your DHCP setup ( only if using GNS )

[root@gract1 Desktop]#  cluvfy comp dhcp -clustername gract -verbose
Checking if any DHCP server exists on the network...
PRVG-5723 : Network CRS resource is configured to use DHCP provided IP addresses
Verification of DHCP Check was unsuccessful on all the specified nodes.
--> If network resource is ONLINE you aren't allowed to  run this command  

DESCRIPTION:
Checks if DHCP server exists on the network and is capable of providing required number of IP addresses. 
This check also verifies the response time for the DHCP server. The checks are all done on the local node. 
For port values less than 1024 CVU needs to be run as root user. If -networks is specified and it contains 
a PUBLIC network then DHCP packets are sent on the public network. By default the network on which the host 
IP is specified is used. This check must not be done while default network CRS resource configured to use 
DHCP provided IP address is online.

In my case even stopping nodeapps doesn't help .
Only a full cluster shutdown the command seems query the DHCP server !

[root@gract1 Desktop]#  cluvfy comp dhcp -clustername gract -verbose
Verifying DHCP Check 
Checking if any DHCP server exists on the network...
Checking if network CRS resource is configured and online
Network CRS resource is offline or not configured. Proceeding with DHCP checks.
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
At least one DHCP server exists on the network and is listening on port 67
Checking if DHCP server has sufficient free IP addresses for all VIPs...
Sending DHCP "DISCOVER" packets for client ID "gract-scan1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.170/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-scan2-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan2-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.169/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-scan3-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-scan3-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.168/255.255.255.0, lease time: 21600
Sending DHCP "DISCOVER" packets for client ID "gract-gract1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600
Sending DHCP "REQUEST" packets for client ID "gract-gract1-vip"
CRS-10009: DHCP server returned server: 192.168.1.50, loan address : 192.168.1.174/255.255.255.0, lease time: 21600
CRS-10012: released DHCP server lease for client ID gract-scan1-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-scan2-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-scan3-vip on port 67
CRS-10012: released DHCP server lease for client ID gract-gract1-vip on port 67
DHCP server was able to provide sufficient number of IP addresses
The DHCP server response time is within acceptable limits
Verification of DHCP Check was successful. 

The nameserver /var/log/messages shows the following: 
Jan 21 14:42:53 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:54 ns1 dhcpd: DHCPOFFER on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: Wrote 6 leases to leases file.
Jan 21 14:42:55 ns1 dhcpd: DHCPREQUEST for 192.168.1.170 (192.168.1.50) from 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: DHCPACK on 192.168.1.170 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:55 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2
Jan 21 14:42:56 ns1 dhcpd: DHCPOFFER on 192.168.1.169 to 00:00:00:00:00:00 via eth2
Jan 21 14:42:56 ns1 dhcpd: DHCPDISCOVER from 00:00:00:00:00:00 via eth2

Reference :

  • Common cluvfy errors and warnings

2. Common cluvfy errors and warnings including first debugging steps

PRIF-10, PRVG-1060, PRCT-1011  [ cluvfy  stage -pre crsinst ]

Current Configuration :

  • Your CRS stack doesn’t come up and you want to verify your CRS statck
  • Your are running  cluvfy  stage -pre crsinst in a ready installed CRS stack

ERROR       : PRVG-1060 : Failed to retrieve the network interface classification information from an existing CRS home at path "/u01/app/121/grid" on the local node
              PRCT-1011 : Failed to run "oifcfg". Detailed error: PRIF-10: failed to initialize the cluster registry
Command     :   cluvfy  stage -pre crsinst in a ready installed CRS stack
Workaround 1: Try to start clusterware in exclusive mode 
               # crsctl start crs -excl 
                 Oracle High Availability Services is online 
                 CRS-4692: Cluster Ready Services is online in exclusive mode 
                 CRS-4529: Cluster Synchronization Services is online 
                 CRS-4533: Event Manager is online 
                $ bin/cluvfy  stage -pre crsinst -n gract1 
               Note if you can startup cluvfy in exclusive mode cluvfy  stage -post crsinst should work too 
                 $  cluvfy  stage -post crsinst -n gract1 
Workaround 2: Need to be used if you can start the CRS stack in exclusive mode  
               If you can startup the CRS stack you may use the WA from  
                  Bug 17505999 : CVU CHECKS FOR ACTIVEVERSION WHEN CRS STACK IS NOT UP. 
                  # mv /etc/oraInst.loc /etc/oraInst.loc_sav 
                  # mv /etc/oracle  /etc/oracle_sav 
                 
                $ bin/cluvfy  -version 
                   12.1.0.1.0 Build 112713x8664 
                Now the command below should work and as said before always download the latest cluvfy version ! 
                 $  bin/cluvfy  stage -pre crsinst -n gract1 
                 .. Check for /dev/shm mounted as temporary file system passed 
                  Pre-check for cluster services setup was successful.
 Reference :    Bug 17505999 : CVU CHECKS FOR ACTIVEVERSION WHEN CRS STACK IS NOT UP.

PRVF-0002 : Could not retrieve local nodename

Command    : $ ./bin/cluvfy -h
Error      : PRVF-0002 : Could not retrieve local nodename
Root cause : Nameserver down, host not not yet know in DNS 
             $   nslookup grac41   returns error
               Server:        192.135.82.44
               Address:    192.135.82.44#53
               ** server can't find grac41: NXDOMAIN
Fix         : Restart DNS, or configure DNS . Nslookup should work in any case !

PRVG-1013 : The path “/u01/app/11203/grid” does not exist or cannot be created

Command    : cluvfy stage -pre nodeadd -n grac3 -verbose
Error      : PRVG-1013 : The path "/u01/app/11203/grid" does not exist or cannot be created on the nodes to be added
             Shared resources check for node addition failed:
Logfile    : Check cluvify log:  $GRID_HOME/cv/log/cvutrace.log.0
             [ 15025@grac1.example.com] [Worker 1] [ 2013-08-29 15:17:08.266 CEST ] [NativeSystem.isCmdScv:499]  isCmdScv: 
             cmd=[/usr/bin/ssh -o FallBackToRsh=no  -o PasswordAuthentication=no  -o StrictHostKeyChecking=yes  
             -o NumberOfPasswordPrompts=0  grac3 -n 
             /bin/sh -c "if [  -d /u01 -a -w /u01 ] ; then echo exists; fi"]
             ...
             [15025@grac1.example.com] [main] [ 2013-08-29 15:17:08.270 CEST ] [TaskNodeAddDelete.checkSharedPath:559]  
             PRVG-1013 : The path "/u01/app/11203/grid" does not exist or cannot be created on the nodes to be added
             [15025@grac1.example.com] [main] [ 2013-08-29 15:17:08.270 CEST ] [ResultSet.traceResultSet:359]
             Node Add/Delete ResultSet trace.
             Overall Status->VERIFICATION_FAILED
             grac3-->VERIFICATION_FAILED
Root cause:  cluvfy commands tries to check the /u01 directory with write attribute and fails
             /bin/sh -c "if [  -d /u01 -a -w /u01 ] ; then echo exists; fi"
Code Fix     : drop -w argument and we get the required fixed ouput
              $  /bin/sh -c "if [  -d /u01 -a /u01 ] ; then echo exists; fi"
               exists
Related BUG:
             Bug 13241453 : LNX64-12.1-CVU: "CLUVFY STAGE -POST NODEADD" COMMAND REPORTS PRVG-1013 ERROR

PRVF-5229 : GNS VIP is active before Clusterware installation

Command    : $ ./bin/cluvfy comp gns -precrsinst -domain grid.example.com -vip 192.168.1.50 -verbose -n grac121
              Verifying GNS integrity 
              Checking GNS integrity...
              Checking if the GNS subdomain name is valid...
              The GNS subdomain name "grid.example.com" is a valid domain name
              Checking if the GNS VIP is a valid address...
              GNS VIP "192.168.1.50" resolves to a valid IP address
              Checking the status of GNS VIP...
Error       : Error PRVF-5229 : GNS VIP is active before Clusterware installation
              GNS integrity check passed
Fix         : If your clusterware is already installed and up and running ignore this error
              If this is a new install use an unsed TPC/IP address for your GNS VIP ( note ping should fail ! )

PRVF-4007 : User equivalence check failed for user “oracle”

Command   : $ ./bin/cluvfy stage -pre crsinst -n grac1 
Error     : PRVF-4007 : User equivalence check failed for user "oracle" 
Fix       : Run  sshUserSetup.sh            
            $ ./sshUserSetup.sh -user grid -hosts "grac1 grac2"  -noPromptPassphrase            
            Verify SSH connectivity:            
            $ /usr/bin/ssh -x -l grid  grac1 date             Tue Jul 16 12:14:17 CEST 2013            
            $ /usr/bin/ssh -x -l grid  grac2 date             Tue Jul 16 12:14:25 CEST 2a013

PRVF-9992 : Group of device “/dev/oracleasm/disks/DATA1” did not match the expected group

Command    : $ ./bin/cluvfy stage -pre crsinst -n grac1 -asm -asmdev /dev/oracleasm/disks/DATA1 Checking consistency of device group across all nodes... 
Error      : PRVF-9992 : Group of device "/dev/oracleasm/disks/DATA1" did not match the expected group. [Expected = "dba"; Found = "{asmadmin=[grac1]}"] 
Root cause : Cluvfy doesn't know that grid user belongs to a different group 
Fix:       : Run cluvfy with -asmgrp asmadmin to provide correct group mappings: 
             $ ./bin/cluvfy stage -pre crsinst -n grac1 -asm -asmdev /dev/oracleasm/disks/DATA1 -asmgrp asmadmin

PRVF-9802 : Attempt to get udev info from node “grac1” failed

 Command   : $ ./bin/cluvfy stage -pre crsinst -n grac1 -asm -asmdev /dev/oracleasm/disks/DATA1 
Error     : PRVF-9802 : Attempt to get udev info from node "grac1" failed
           UDev attributes check failed for ASM Disks
Bug       : Bug 12804811 : [11203-LIN64-110725] OUI PREREQUISITE CHECK FAILED IN OL6
Fix       : If using ASMLIB you can ignore currently this error
            

PRVF-7539 – User “grid” does not belong to group “dba

Error       : PRVF-7539 - User "grid" does not belong to group "dba
Command     : $  ./bin/cluvfy comp sys -p crs -n grac1
Fix         :  Add grid owner to DBA group
Note        : ID 1505586.1 : CVU found following errors with Clusterware setup : User "grid" does not 
          belong to group "dba" [ID 1505586.1]
            : ID 316817.1] Cluster Verification Utility (CLUVFY) FAQ [ID 316817.1]
Bug         :  Bug 12422324 : LNX64-112-CMT: HIT PRVF-7539 : GROUP "DBA" DOES NOT EXIST ON OUDA NODE ( Fixed : 11.2.0.4 )

PRVF-7617 : Node connectivity between “grac1 : 192.168.1.61” and “grac1 : 192.168.1.55” failed

Command     : $ ./bin/cluvfy comp nodecon -n grac1
Error       : PRVF-7617 : Node connectivity between "grac1 : 192.168.1.61" and "grac1 : 192.168.1.55" failed
Action 1    : Disable firewall / IP tables
             # service iptables stop 
             # chkconfig iptables off
             # iptables -F
             # service iptables status 
             If after a reboot the firewall is enabled again please              
Action 2    : Checking ssh connectivity 
              $ id
              uid=501(grid) gid=54321(oinstall) groups=54321(oinstall),504(asmadmin),506(asmdba),507(asmoper),54322(dba)
              $ ssh grac1 date 
                Sat Jul 27 13:42:19 CEST 2013
Fix         : Seems that we need to run cluvfy comp nodecon with at least 2 Nodes
              Working Command: $ ./bin/cluvfy comp nodecon -n grac1,grac2 
                -> Node connectivity check passed
              Failing Command: $ ./bin/cluvfy comp nodecon -n grac1
                -> Verification of node connectivity was unsuccessful. 
                   Checks did not pass for the following node(s):
               grac1 : 192.168.1.61
            :  Ignore this error if running with a single RAC Node  - Rerun later when both nodes are available 
            : Verify that that ping is working with all involved IP addresses

Action 3    : 2 or more network interfaces are using the same network address
              Test your Node Commectivity by running:
              $ /u01/app/11203/grid/bin//cluvfy comp nodecon -i eth1,eth2 -n grac31,grac32,grac33 -verbose

              Interface information for node "grac32"
              Name   IP Address      Subnet          Gateway         Def. Gateway    HW Address        MTU   
              ------ --------------- --------------- --------------- --------------- ----------------- ------
              eth0   10.0.2.15       10.0.2.0        0.0.0.0         10.0.2.2        08:00:27:88:32:F3 1500  
              eth1   192.168.1.122   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:EB:39:F1 1500  
              eth3   192.168.1.209   192.168.1.0     0.0.0.0         10.0.2.2        08:00:27:69:AE:D2 1500  

              Verifiy current settings via ifconfig
              eth1     Link encap:Ethernet  HWaddr 08:00:27:5A:61:E3  
                       inet addr:192.168.1.121  Bcast:192.168.1.255  Mask:255.255.255.0
              eth3     Link encap:Ethernet  HWaddr 08:00:27:69:AE:D2  
                       inet addr:192.168.1.209  Bcast:192.168.1.255  Mask:255.255.255.0

              --> Both eth1 and eth3 are using the same network address 192.168.1 
Fix           : Setup your network devices and provide a different IP Address like 192.168.3 for eth3 

Action 4      :Intermittent PRVF-7617 error with cluvfy 11.2.0.3 ( cluvfy Bug )     
               $  /u01/app/11203/grid/bin/cluvfy -version
               11.2.0.3.0 Build 090311x8664
               $ /u01/app/11203/grid/bin/cluvfy comp nodecon -i eth1,eth2 -n grac31,grac32,grac33 -verbos
               --> Fails intermittent with following ERROR: 
               PRVF-7617 : Node connectivity between "grac31 : 192.168.1.121" and "grac33 : 192.168.1.220" failed

               $  /home/grid/cluvfy_121/bin/cluvfy -version
               12.1.0.1.0 Build 062813x8664
               $  /home/grid/cluvfy_121/bin/cluvfy comp nodecon -i eth1,eth2 -n grac31,grac32,grac33 -verbose
               --> Works for each run
    Fix      : Always use latest 12.1 cluvfy utility to test Node connectivity 

References:  
               PRVF-7617: TCP connectivity check failed for subnet (Doc ID 1335136.1)
               Bug 16176086 - SOLX64-12.1-CVU:CVU REPORT NODE CONNECTIVITY CHECK FAIL FOR NICS ON SAME NODE 
               Bug 17043435 : EM 12C: SPORADIC INTERRUPTION WITHIN RAC-DEPLOYMENT AT THE STEP INSTALL/CLONE OR

PRVG-1172 : The IP address “192.168.122.1” is on multiple interfaces “virbr0” on nodes “grac42,grac41”

Command    :  $ ./bin/cluvfy stage -pre crsinst -asm -presence local -asmgrp asmadmin -asmdev /dev/oracleasm/disks/DATA1,/dev/oracleasm/disks/DATA2,/dev/oracleasm/disks/DATA3,/dev/oracleasm/disks/DATA4 -n grac41,grac42   a
Error      :  PRVG-1172 : The IP address "192.168.122.1" is on multiple interfaces "virbr0,virbr0" on nodes "grac42,grac41"
Root cause :  There are multiple networks ( eth0,eth1,eth2,virbr0  ) defined
Fix        :  use cluvfy with  -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect -n grac41,grac42
Sample     :  $ ./bin/cluvfy stage -pre crsinst -asm -presence local -asmgrp asmadmin -asmdev /dev/oracleasm/disks/DATA1,/dev/oracleasm/disks/DATA2,/dev/oracleasm/disks/DATA3,/dev/oracleasm/disks/DATA4 -networks eth1:192.168.1.0:PUBLIC/eth2:192.168.2.0:cluster_interconnect -n grac41,grac42  ss

Cluvfy Warnings:

PRVG-1101 : SCAN name “grac4-scan.grid4.example.com” failed to resolve  ( PRVF-4664 PRVF-4657

Warning:      PRVG-1101 : SCAN name "grac4-scan.grid4.example.com" failed to resolve  
Cause:        An attempt to resolve specified SCAN name to a list of IP addresses failed because SCAN could not be resolved in DNS or GNS using 'nslookup'.
Action:       Verify your GNS/SCAN setup using ping, nslookup can cluvfy
              $  ping -c 1  grac4-scan.grid4.example.com
              PING grac4-scan.grid4.example.com (192.168.1.168) 56(84) bytes of data.
              64 bytes from 192.168.1.168: icmp_seq=1 ttl=64 time=0.021 ms
              --- grac4-scan.grid4.example.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 1ms
               rtt min/avg/max/mdev = 0.021/0.021/0.021/0.000 ms

              $  ping -c 1  grac4-scan.grid4.example.com
              PING grac4-scan.grid4.example.com (192.168.1.170) 56(84) bytes of data.
              64 bytes from 192.168.1.170: icmp_seq=1 ttl=64 time=0.031 ms 
              --- grac4-scan.grid4.example.com ping statistics ---
              1 packets transmitted, 1 received, 0% packet loss, time 2ms
              rtt min/avg/max/mdev = 0.031/0.031/0.031/0.000 ms

             $  ping -c 1  grac4-scan.grid4.example.com
             PING grac4-scan.grid4.example.com (192.168.1.165) 56(84) bytes of data.
             64 bytes from 192.168.1.165: icmp_seq=1 ttl=64 time=0.143 ms
             --- grac4-scan.grid4.example.com ping statistics ---
             1 packets transmitted, 1 received, 0% packet loss, time 0ms
             rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms

             $ nslookup grac4-scan.grid4.example.com
             Server:        192.168.1.50
             Address:    192.168.1.50#53
             Non-authoritative answer:
             Name:    grac4-scan.grid4.example.com
             Address: 192.168.1.168
             Name:    grac4-scan.grid4.example.com
             Address: 192.168.1.165
             Name:    grac4-scan.grid4.example.com
             Address: 192.168.1.170

            $ $GRID_HOME/bin/cluvfy comp scan
            Verifying scan 
            Checking Single Client Access Name (SCAN)...
            Checking TCP connectivity to SCAN Listeners...
            TCP connectivity to SCAN Listeners exists on all cluster nodes
            Checking name resolution setup for "grac4-scan.grid4.example.com"...
            Checking integrity of name service switch configuration file "/etc/nsswitch.conf" ...
            Check for integrity of name service switch configuration file "/etc/nsswitch.conf" passed
            Verification of SCAN VIP and Listener setup passed
            Verification of scan was successful. 

 Fix:       As nsloopkup, ping and cluvfy works as expected you can ignore this warning   

Reference:  RVF-4664 PRVF-4657: Found inconsistent name resolution entries for SCAN name (Doc ID 887471.1)

WARNING    : Could not find a suitable set of interfaces for the private interconnect

Root cause : public ( 192.168.1.60) and private interface ( 192.168.1.61) uses same network adress
Fix             : provide own network address (  192.168.1.xx) for private interconenct 
                  After fix cluvfy reports : 
                  Interfaces found on subnet "192.168.1.0" that are likely candidates for VIP are:
                  grac1 eth0:192.168.1.60
                  Interfaces found on subnet "192.168.2.0" that are likely candidates for a private interconnect are:
                  grac1 eth1:192.168.2.101

WARNING: Could not find a suitable set of interfaces for VIPs

WARNING: Could not find a suitable set of interfaces for VIPs
             Checking subnet mask consistency...
             Subnet mask consistency check passed for subnet "192.168.1.0".
             Subnet mask consistency check passed for subnet "192.168.2.0".
             Subnet mask consistency check passed.
Fix        : Ignore this warning 
Root Cause : Per BUG:4437727, cluvfy makes an incorrect assumption based on RFC 1918 that any IP address/subnet that 
            begins with any of the following octets is private and hence may not be fit for use as a VIP:
            172.16.x.x  through 172.31.x.x
            192.168.x.x
            10.x.x.x
            However, this assumption does not take into account that it is possible to use these IPs as Public IP's on an
            internal network  (or intranet).   Therefore, it is very common to use IP addresses in these ranges as 
            Public IP's and as Virtual IP(s), and this is a supported configuration.  
Reference:
Note:       CLUVFY Fails With Error: Could not find a suitable set of interfaces for VIPs or Private Interconnect [ID 338924.1]

PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option “-x”

Error        :PRVF-5436 : The NTP daemon running on one or more nodes lacks the slewing option "-x"
Solution     :Change  /etc/sysconfig/ntpd
               # OPTIONS="-u ntp:ntp -p /var/run/ntpd.pid"
                to 
                OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"
               Restart NTPD daemon
               [root@ract1 ~]#  service ntpd  restart

PRVF-5217 : An error occurred while trying to look up IP address for “grac1cl.grid2.example.com

WARNING:    PRVF-5217 : An error occurred while trying to look up IP address for "grac1cl.grid2.example.com"
Action    : Verify with dig and nslookup that VIP IP adresss is working:
            $  dig grac1cl-vip.grid2.example.com
             ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.10.rc1.el6 <<>> grac1cl-vip.grid2.example.com
             ;; global options: +cmd
             ;; Got answer:
             ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23546
             ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 1
             ;; QUESTION SECTION:
             ;grac1cl-vip.grid2.example.com.    IN    A
             ;; ANSWER SECTION:
             grac1cl-vip.grid2.example.com. 120 IN    A    192.168.1.121
             ;; AUTHORITY SECTION:
             grid2.example.com.    3600    IN    NS    ns1.example.com.
             grid2.example.com.    3600    IN    NS    gns2.grid2.example.com.
            ;; ADDITIONAL SECTION:
            ns1.example.com.    3600    IN    A    192.168.1.50
           ;; Query time: 12 msec
           ;; SERVER: 192.168.1.50#53(192.168.1.50)
           ;; WHEN: Mon Aug 12 09:39:24 2013
           ;; MSG SIZE  rcvd: 116
          $  nslookup grac1cl-vip.grid2.example.com
           Server:        192.168.1.50
           Address:    192.168.1.50#53
           Non-authoritative answer:
           Name:    grac1cl-vip.grid2.example.com
           Address: 192.168.1.121
Fix:      Ignores this warning.
          DNS server on this system has stripped the authoritative flag. This results into the throw of an 
          UnknownHostExecption when  CVU calls InetAddress.getAllByName(..). That's why cluvfy returns a WARNING.
Reference: Bug 12826689 : PRVF-5217 FROM CVU WHEN VALIDATING GNS 

Running cluvfy comp dns -server fails silent – Cluvfy logs show PRCZ-2090 error

Command  runcluvfy.sh comp dns -server ... just exits with SUCCESS which is not what we expect. Indeed this command should create a local DNS server and block until runcluvfy.sh comp dns -client -last was executed

[grid@ractw21 linuxx64_12201_grid_home]$ runcluvfy.sh comp dns -server -domain grid122.example.com -vipaddress 192.168.1.59/255.255.255.0/enp0s8 -verbose -method root
Enter "ROOT" password:

Verifying Task DNS configuration check ...
Waiting for DNS client requests...
Verifying Task DNS configuration check ...PASSED

Verification of DNS Check was successful. 

CVU operation performed:      DNS Check
Date:                         Apr 11, 2017 3:23:56 PM
CVU home:                     /media/sf_kits/Oracle/122/linuxx64_12201_grid_home/
User:                         grid

Review CVU traces shows that cluvfy command fails with: error  PRCZ-2090
PRCZ-2090 : failed to create host key repository from file "/home/grid/.ssh/known_hosts" to establish SSH connection to node "ractw21"
[main] [ 2017-04-14 17:38:09.204 CEST ] [ExecCommandNoUserEqImpl.runCmd:374]  Final CompositeOperationException: PRCZ-2009 : Failed to execute command "/media/sf_kits/Oracle/122/linuxx64_12201_grid_home//cv/admin/odnsdlite" as root within 0 seconds on nodes "ractw21"

Fix login user grid via ssh and create the proper ssh environment
[grid@ractw21 linuxx64_12201_grid_home]$  ssh grid@ractw21.example.com

 

PRVF-5636 , PRVF-5637 : The DNS response time for an unreachable node exceeded “15000” ms

Problem 1: 
Command   : $ ./bin/cluvfy stage -pre crsinst -n grac1 -asm -asmdev /dev/oracleasm/disks/DATA1
Error     : PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: grac1
Root Cause: nsloopup return wrong status message
            # nslookup hugo.example.com
            Server:        192.168.1.50
            Address:    192.168.1.50#53
            ** server can't find hugo.example.com: NXDOMAIN
            #  echo $?
            1
            --> Note the error can't find hugo.example.com is ok - but no the status code
 Note:      PRVF-5637 : DNS response time could not be checked on following nodes [ID 1480242.1]
 Bug :      Bug 16038314 : PRVF-5637 : DNS RESPONSE TIME COULD NOT BE CHECKED ON FOLLOWING NODESa

 Problem 2:
 Version   : 12.1.0.2
 Command   : $GRID_HOME/addnode/addnode.sh -silent "CLUSTER_NEW_NODES={gract3}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={auto}" "CLUSTER_NEW_NODE_ROLES={hub}" a
 Error     : SEVERE: [FATAL] [INS-13013] Target environment does not meet some mandatory requirements.
             FINE: [Task.perform:594]
             sTaskResolvConfIntegrity:Task resolv.conf Integrity[STASKRESOLVCONFINTEGRITY]:TASK_SUMMARY:FAILED:CRITICAL:VERIFICATION_FAILED
             PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: gract1,gract3a
Verify     : Runs ping  SCAN address for a long time to check out node connectivity
             $ ping -v gract-scan.grid12c.example.com
             $ nsloopkup gract-scan.grid12c.example.com
             Note you may need to run above commands a long time until error comes up
Root Cause : Due to the intermittent hang of the above OS commands a firewall issue could be identified
Fix        : Disable firewall
Reference  : PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes (Doc ID 1356975.1) 
             PRVF-5637 : DNS response time could not be checked on following nodes (Doc ID 1480242.1)
             Using 11.2 WA by setting : $ export IGNORE_PREADDNODE_CHECKS=Y did not help

PRVF-4037 : CRS is not installed on any of the nodes

Error     : PRVF-4037 : CRS is not installed on any of the nodes
            PRVF-5447 : Could not verify sharedness of Oracle Cluster Voting Disk configuration
Command   : $ cluvfy stage -pre crsinst -upgrade -n grac41,grac42,grac43 -rolling -src_crshome $GRID_HOME 
           -dest_crshome /u01/app/grid_new -dest_version 12.1.0.1.0  -fixup -fixupdir /tmp -verbose
Root Cause:  /u01/app/oraInventory/ContentsXML/inventory.xml was corrupted ( missing node_list for GRID HOME )
            <HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/app/11204/grid" TYPE="O" IDX="1" CRS="true"/>
            <HOME NAME="OraDb11g_home1" LOC="/u01/app/oracle/product/11204/racdb" TYPE="O" IDX="2">
              <NODE_LIST>
               <NODE NAME="grac41"/>
               <NODE NAME="grac42"/>
               <NODE NAME="grac43"/>
              ....
Fix: Correct entry in inventory.xml
            <HOME NAME="Ora11g_gridinfrahome1" LOC="/u01/app/11204/grid" TYPE="O" IDX="1" CRS="true">
               <NODE_LIST>
                  <NODE NAME="grac41"/>
                  <NODE NAME="grac42"/>
                  <NODE NAME="grac43"/>
               </NODE_LIST>
               ...

Reference : CRS is not installed on any of the nodes (Doc ID 1316815.1)
            CRS is not installed on any of the nodes. Inventory.xml is changed even when no problem with TMP files. (Doc ID 1352648.1)

avahi-daemon is running

Cluvfy report : 
     Checking daemon "avahi-daemon" is not configured and running
     Daemon not configured check failed for process "avahi-daemon"
     Check failed on nodes: 
        ract2,ract1
     Daemon not running check failed for process "avahi-daemon"
     Check failed on nodes: 
        ract2,ract1

Verify  for running avahi-daemon daemon
     $ ps -elf | grep avahi-daemon
     5 S avahi     4159     1  0  80   0 -  5838 poll_s Apr02 ?        00:00:00 avahi-daemon: running [ract1.local]
     1 S avahi     4160  4159  0  80   0 -  5806 unix_s Apr02 ?        00:00:00 avahi-daemon: chroot helper

Fix it ( run on all nodes ) :
      To shut it down, as root
      # /etc/init.d/avahi-daemon stop
      To disable it, as root:
      # /sbin/chkconfig  avahi-daemon off

Reference: 
    Cluster After Private Network Recovered if avahi Daemon is up and Running (Doc ID 1501093.1)

Reference data is not available for verifying prerequisites on this operating system distribution

Command    : ./bin/cluvfy stage -pre crsinst -upgrade -n gract3 -rolling -src_crshome $GRID_HOME 
                -dest_crshome /u01/app/12102/grid -dest_version 12.1.0.2.0 -verbose
Error      :  Reference data is not available for verifying prerequisites on this operating system distribution
              Verification cannot proceed
              Pre-check for cluster services setup was unsuccessful on all the nodes.
Root cause:  cluvfy runs rpm -qa | grep  release
             --> if this command fails above error was thrown
             Working Node 
             [root@gract1 log]# rpm -qa | grep  release
             oraclelinux-release-6Server-4.0.4.x86_64
             redhat-release-server-6Server-6.4.0.4.0.1.el6.x86_64
             oraclelinux-release-notes-6Server-9.x86_64
             Failing Node
             [root@gract1 log]#  rpm -qa | grep  release
             rpmdb: /var/lib/rpm/__db.003: No such file or directory
             error: db3 error(2) from dbenv->open: No such file or directory 
             ->  Due to space pressure /var/lib/rpm was partially deleted on a specific RAC node
 Fix        : Restore RPM packages form a REMOTE RAC node or from backup
             [root@gract1 lib]# pwd
             /var/lib
             [root@gract1 lib]#  scp -r gract3:/var/lib/rpm .
             Verify RPM database
             [root@gract1 log]#   rpm -qa | grep  release
             oraclelinux-release-6Server-4.0.4.x86_64
             redhat-release-server-6Server-6.4.0.4.0.1.el6.x86_64
             oraclelinux-release-notes-6Server-9.x86_64
Related Nodes:
             - Oracle Secure Enterprise Search 11.2.2.2 Installation Problem On RHEL 6 - [INS-75028] 
               Environment Does Not Meet Minimum Requirements: Unsupported OS Distribution (Doc ID 1568473.1)
             - RHEL6: 12c OUI INS-13001: CVU Fails: Reference data is not available for verifying prerequisites on 
               this operating system distribution (Doc ID 1567127.1)

Cluvfy Debug : PRVG-11049

Create a problem - Shutdown cluster Interconnect:
$ ifconfig eth1 down

Verify error with cluvfy
$ cluvfy comp nodecon -n all -i eth1
Verifying node connectivity 
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
ERROR: 
PRVG-11049 : Interface "eth1" does not exist on nodes "grac2"
...

Step 1 - check cvutrace.log.0 trace:
# grep PRVG /home/grid/cluvfy112/cv/log/cvutrace.log.0
[21684@grac1.example.com] [main] [ 2013-07-29 18:32:46.429 CEST ] [TaskNodeConnectivity.performSubnetExistanceCheck:1394]  Found Bad node(s): PRVG-11049 : Interface "eth1" does not exist on nodes "grac2"
PRVG-11049 : Interface "eth1" does not exist on nodes "grac2"
          ERRORMSG(grac2): PRVG-11049 : Interface "eth1" does not exist on nodes "grac2"

Step 2: Create a script and set trace level:  SRVM_TRACE_LEVEL=2
rm -rf /tmp/cvutrace
mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=2
./bin/cluvfy comp nodecon -n all -i eth1 -verbose
ls /tmp/cvutrace

Run script and check cluvfy trace file:
[32478@grac1.example.com] [main] [ 2013-07-29 19:08:23.125 CEST ] [TaskNodeConnectivity.performSubnetExistanceCheck:1367]  getting interface eth1 on node grac2
[32478@grac1.example.com] [main] [ 2013-07-29 19:08:23.126 CEST ] [TaskNodeConnectivity.performSubnetExistanceCheck:1374]  Node: grac2 has no 'eth1' interfaces!
[32478@grac1.example.com] [main] [ 2013-07-29 19:08:23.126 CEST ] [TaskNodeConnectivity.performSubnetExistanceCheck:1367]  getting interface eth1 on node grac1
[32478@grac1.example.com] [main] [ 2013-07-29 19:08:23.127 CEST ] [TaskNodeConnectivity.performSubnetExistanceCheck:1394]  Found Bad node(s): PRVG-11049 : Interface "eth1" does not exist on nodes "grac2"

Verify problem with ifconfig on grac2 ( eth1 is not up )
# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 08:00:27:8E:6D:24  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:413623 errors:0 dropped:0 overruns:0 frame:0
          TX packets:457739 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:226391378 (215.9 MiB)  TX bytes:300565159 (286.6 MiB)
          Interrupt:16 Base address:0xd240 
Fix : 
Restart eth1 and restart crs 
# ifconfig eth1 up
#  $GRID_HOME/bin/crsctl stop  crs -f
#  $GRID_HOME/bin/crsctl start  crs 

Debug PRVF-9802

From cluvfy log following command is failing 
$  /tmp/CVU_12.1.0.1.0_grid/exectask.sh -getudevinfo oracleasm/disks/DATA1
<CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP></SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT><SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO></CV_ERR>
<CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG><CV_CMDLOG>
<CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo oracleasm/disks/DATA1 </CV_INITCMD>
<CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
| awk '{if ("oracleasm/disks/DATA1" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT></CV_CMDLOG><CV_ERES>0</CV_ERES>
--> No Output

Failing Command
$ /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
| awk '{if ("oracleasm/disks/DATA1" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g'
Diagnostics : cluvfy is scanning directory /etc/udev/rules.d/ for udev rules for device : oracleasm/disks/DATA1 - but couldn't find a rule that device

Fix: setup udev rules.

After fixing the udev rules the above command works fine and cluvfy doesn't complain anymore 
$ /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'
kvm @ /etc/udev/rules.d/80-kvm.rules: KERNEL=="kvm", GROUP="kvm", MODE="0666"
fuse @ /etc/udev/rules.d/99-fuse.rules: KERNEL=="fuse", MODE="0666",OWNER="root",GROUP="root"
Fix: setup udev rules .....
Verify: $ /tmp/CVU_12.1.0.1.0_grid/exectask.sh -getudevinfo  /dev/asmdisk1_udev_sdb1
<CV_VAL><USMDEV><USMDEV_LINE>/etc/udev/rules.d/99-oracle-asmdevices.rules KERNEL=="sdb1", NAME="asmdisk1_udev_sdb1", OWNER="grid", GROUP="asmadmin", MODE="0660"    
</USMDEV_LINE><USMDEV_NAME>sdb1</USMDEV_NAME><USMDEV_OWNER>grid</USMDEV_OWNER><USMDEV_GROUP>asmadmin</USMDEV_GROUP><USMDEV_PERMS>0660</USMDEV_PERMS></USMDEV></CV_VAL><CV_VRES>0</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG><CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo /dev/asmdisk1_udev_sdb1 </CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' | awk '{if ("/dev/asmdisk1_udev_sdb1" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT> /etc/udev/rules.d/99-oracle-asmdevices.rules KERNEL=="sdb1", NAME="asmdisk1_udev_sdb1", OWNER="grid", GROUP="asmadmin", MODE="0660"    
</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT></CV_CMDLOG><CV_ERES>0</CV_ERES>

Debug and Fix  PRVG-13606 Error

  • Setup Chrony to avoid PRVG-13606 in a VirtualBox/RAC env

Reference:

  • Cluvfy Usage

 

 3. Debug Cluvfy error ERROR: PRVF-9802

ERROR: 
PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified


Checking: cv/log/cvutrace.log.0

          ERRORMSG(hract21): PRVF-9802 : Attempt to get udev information from node "hract21" failed
No UDEV rule found for device(s) specified

[Thread-757] [ 2015-01-29 15:56:44.157 CET ] [StreamReader.run:65]  OUTPUT><CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP>
</SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT>
<SLOS_OTHERINFO>No UDEV rule found for device(s) specified</SLOS_OTHERINFO>
</CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
  | sed -e 's/://' -e 's/\.\*/\*/g'</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
..
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:144]  runCommand: process returns 0
[Worker 3] [ 2015-01-29 15:56:44.157 CET ] [RuntimeExec.runCommand:161]  RunTimeExec: output>

Run the exectask from OS prompt :
[root@hract21 ~]# /tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
<CV_ERR><SLOS_LOC>CVU00310</SLOS_LOC><SLOS_OP></SLOS_OP><SLOS_CAT>OTHEROS</SLOS_CAT><SLOS_OTHERINFO>No UDEV rule found for device(s)
 specified</SLOS_OTHERINFO></CV_ERR><CV_VRES>1</CV_VRES><CV_LOG>Exectask:getudevinfo success</CV_LOG>
<CV_CMDLOG><CV_INITCMD>/tmp/CVU_12.1.0.1.0_grid/exectask -getudevinfo asmdisk1_10G,asmdisk2_10G,asmdisk3_10G,asmdisk4_10G 
</CV_INITCMD><CV_CMD>popen /etc/udev/udev.conf</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/permissions.d</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>opendir /etc/udev/rules.d</CV_CMD><CV_CMDOUT> Reading: /etc/udev/rules.d</CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>
<CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '{if ("asmdisk1_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' 
 | sed -e 's/://' -e 's/\.\*/\*/g'
 </CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT><CV_CMD>popen /bin/grep KERNEL== /etc/udev/rules.d/*.rules 
 | grep GROUP | grep MODE | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'
 | awk '{if ("asmdisk2_10G" ~ $1 ) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
 | sed -e 's/://' -e 's/\.\*/\*/g'
</CV_CMD><CV_CMDOUT></CV_CMDOUT><CV_CMDSTAT>0</CV_CMDSTAT>

Test the exectask in detail:
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE  
 | sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
 | awk '  {if ("asmdisk1_10G" ~ $1) print $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
--> Here awk returns nothing !

[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/' 
  |awk '  {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'   
 
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent", 
   RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid"

--> The above sed script adds sd?1 as parameter $1 and @ as parameter $2 . 
    later awk search for "asmdisk1_10G" in parameter $1   if ("asmdisk1_10G" ~ $1) ... 
        as string "asmdisk1_10G" can be found in paramter $3 but in in paramter $1 !!
    
Potential Fix : Modify search string we get a record back !
[root@hract21 rules.d]# cat /etc/udev/rules.d/*.rules | grep GROUP | grep MODE 
  |sed -e '/^#/d' -e 's/\*/.*/g' -e 's/\(.*\)KERNEL=="\([^\"]*\)\(.*\)/\2 @ \1 KERNEL=="\2\3/'  
  |awk  '  /asmdisk1_10G/ {  print $1, $2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}'
sd?1 @ NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",
 RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c", OWNER="grid", ..

--> Seems the way Oracle extracts UDEV data is not working for OEL 6 where UDEV Records could look like: 
NAME="asmdisk1_10G", KERNEL=="sd?1", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -d /dev/$parent",   
    RESULT=="1ATA_VBOX_HARDDISK_VBe7363848-cbf94b0c",  OWNER="grid", GROUP="asmadmin", MODE="0660"

As the ASM disk has the proper permissions I decided to ignore the warnings  
[root@hract21 rules.d]# ls -l  /dev/asm*
brw-rw---- 1 grid asmadmin 8, 17 Jan 29 09:33 /dev/asmdisk1_10G
brw-rw---- 1 grid asmadmin 8, 33 Jan 29 09:33 /dev/asmdisk2_10G
brw-rw---- 1 grid asmadmin 8, 49 Jan 29 09:33 /dev/asmdisk3_10G
brw-rw---- 1 grid asmadmin 8, 65 Jan 29 09:33 /dev/asmdisk4_10G

No comments:

Post a Comment