Khmer System Administrative: OSWatcher

Install OSWatcher

Download OSWatcher from OTN and Untar oswatcher ( see Note Doc ID 301137.1 )
Untar the reladet TAR archive : tar xvf oswbb601.tar
Before starting OSWatcher is checking for a process with OSWatcher string in its full path .
Don’t install OSWatcher in a directory named OSWatcher and use full PATH to start the tool. Even running a gedit session ( $ gedit OSWatcher.dat ) will signal that OSWatcher is running and you can’t start OSWatcher
In short : ps -e | grep OSWatch should not return any results before starting OSWatcher

# tar xvf oswbb601.tar 
oswbb/
oswbb/src/
oswbb/src/tombody.gif
oswbb/src/Thumbs.db
oswbb/src/missing_graphic.gif

Create file private.net for monitoring Cluster interconnet

Create file private.net based on Exampleprivate.net - here is the Linux Version 

#####################################################################
# This file contains examples of how to monitor private networks. To
# monitor your private networks create an executable file in this same
# directory named private.net. Use the example for your host os below.
# Make sure not to remove the last line in this file. Your file
# private.net MUST contain the rm lock.file line.
######################################################################
#Linux Example
######################################################################
echo "zzz ***"`date`
traceroute -r -F grac1int.example.com
traceroute -r -F grac2int.example.com
traceroute -r -F grac3int.example.com
######################################################################
rm locks/lock.file

Debugging a Node Eviction using OSWatcher

Overview

Note as OSWatcher is not running with RT privs ( like CHM ) – we may miss a lot of interesting records
OSWatcher utils ( vmstat , iostat , ..) may not get scheduled if we have CPU / Paging/ Swapping problems
Always check the OSWatcher vmstat file for missing records
If we have missing records for our Eviction Time we can only look in the past before eviction time
Always check CHM data as here we get much more details about our system status during the Eviction Time
Use the graphical tool of OSWatcher to check for high count of blocking process

Create an OSWatcher Analyzer report

Locate OSWatcher files % find . -name archive ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive ./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive ./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive Unzip OSWAtcher archives % gunzip -r ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive % gunzip -r ./tds-2014-03-22/14387023_grac42.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac42/u01/app/11204/grid/oswbb/archive % gunzip -r ./tds-2014-03-22/14386169_grac43.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac43/u01/app/11204/grid/oswbb/archive Create an analyzer file % java -jar /home/hhutzler/Tools/SupportBundle_v1_3_1/oswbb/oswbba.jar \ -i ./tds-2014-03-22/14387025_grac41.tfa_Sat_Mar_22_15_46_29_CET_2014.zip/grac41/u01/app/11204/grid/oswbb/archive \ -S grac41.txt -B Mar 22 9:00:00 2014 -E Mar 22 11:00:00 2014

Does OSWatcher provide enough data to analyze the problem ?

% grep zzz grac41.example.com_vmstat_14.03.22.0900.dat .... zzz ***Sat Mar 22 09:57:04 CET 2014 zzz ***Sat Mar 22 09:57:34 CET 2014 zzz ***Sat Mar 22 09:58:09 CET 2014 % grep zzz grac41.example.com_vmstat_14.03.22.1000.dat zzz ***Sat Mar 22 10:09:35 CET 2014 zzz ***Sat Mar 22 10:10:05 CET 2014 zzz ***Sat Mar 22 10:10:35 CET 2014 zzz ***Sat Mar 22 10:11:05 CET 2014

We don’t have enough data during the eviction so we may not able to find the root cause of the problem
OSWatcher records missing from 09:58:09 – 10:09:35

Read and interpret OSWatcher Analyzer Data

Analyzing System Status

############################################################################ # Section 1: System Status # # This section lists the status of each major subsystem. Status values are: # Critical: The subsystem requires immediate attention # Warning: The subsystem detailed findings should be reviewed # OK: The subsystem was found to be okay # Unknown: The status of the subsystem could not be determined # # Subsystem Status ------------------------ CPU CRITICAL MEMORY WARNING I/O WARNING NET WARNING --> Need to review all subsytems ############################################################################ # Section 2.0: System Slowdown Summary Ordered By Impact # # This section lists the times when the OS started to slowdown. oswbba is # able to measure this by looking at the timestamps in the individual files # it collects. It compares the time between the snapshots and looks to see # how this time differs from the expected timestamp which will be the oswbb # $ Snapshot Freq value listed at the top of this file. Any slowdowns listed # in this section will be ordered by the slowdown Secs column.The subsystem # most likely responsible for the slowdown will be identified here. # SnapTime Variance Secs Flags SubSystem ----------------------------------------------------------------- Sat Mar 22 09:56:33 1.5 45 0020-00-01 Memory Sat Mar 22 09:55:48 1.3 39 2200-00-00 CPU Sat Mar 22 09:55:09 1.1 35 2200-00-00 Memory Sat Mar 22 09:58:09 1.1 35 2200-00-01 Memory --> Both CPU and Memory problem are reported as root cause for system slowdown Report Summary SnapTime Variance Secs Flags Cause(Most Likely) ----------------------------------------------------------------- Sat Mar 22 09:58:09 1.1 35 2200-30-01 1: System paging memory 2: Large Run Queue >>>Looking for cause of problem 1: System paging memory Advise: The OS is paging memory. Reasons: 1. The system is under stress with respect to memory >>>Looking for cause of problem 2: Large Run Queue Advise: Check why run queue is so large PERCENT Reasons: 1. Possible login storm 2. Possible mutex issue in database (Examine AWR) --> Above reports confirms that CPU run queue is large and System is paging ############################################################################ # Section 3: System General Findings # # This section lists all general findings that require attention. Each # finding has a status along with a subsystem. Further advice may also # available regarding the finding. # CRITICAL: CPU Run Queue observed very high spikes. Advise: Check why run queue is so large. Check: The number of processes for possible login storm Check: AWR for possible mutex issue in database (Examine AWR) CRITICAL: CPU Running in System Mode observed to be high. Advise: Check why large amount of cpu is running in kernel mode. Check: Output of top command top to see what processes are running and using kernel cpu Check: If the system is undersized with respect to CPU capacity WARNING: Memory high paging rate observed. Advise: The OS is low on free memory. Check: The system is under stress with respect to memory WARNING : Disk heavy utilization observed. Advise: Check disks to see why utilization is so high. Check: Hot disk: I/O distribution should be evaluated Check: The system is undersized with respect to I/O capacity Check: AWR for SQL regression causing more I/O WARNING : Disk high service time observed. Advise: Check disks to see why service time is so high. Check: Hot disk: I/O distribution should be evaluated Check: Disk may be defective WARNING : Network UDP errors observed. Advise: UDP protocol only relevant for RAC. Ignore for Non-RAC Advise: Avoid any dropped packets in UDP protocol Check: UDP socket receive buffer on the local machine too small Check: The application not reading the data fast enough Check: Section 7.3 below for more details

Analyzing CPU data

############################################################################ # Section 4.1: CPU RUN QUEUE: # Run queue should not exceed (Value/#CPU > 3) for any long period of time. # Below lists the number of times (NUMBER) and percent of the number of times # (PERCENT) that run queue was High (>3) or Very High (>6). Pay attention to # high spanning multiple snaps as this represents the number of times ru PERCENT # queue remained high in back to back snapshots # ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>3) 12 5.61 Very High (>6) 7 3.27 High spanning multiple snaps 3 1.4 The following snaps recorded very high run queue values: SnapTime Value Value/#CPU ------------------------------------------------ Sat Mar 22 09:55:09 UTC 2014 29 14 Sat Mar 22 09:55:48 UTC 2014 20 10 Sat Mar 22 09:57:04 UTC 2014 117 58 Sat Mar 22 09:58:09 UTC 2014 45 22 --> At 09:57:04 58 process per CPU are waiting - this is way to much ############################################################################ # Section 4.2: CPU UTILIZATION: PERCENT BUSY # CPU utilization should not be high over long periods of time. The higher # the cpu utilization the longer it will take processes to run. Below lists # the number of times (NUMBER) and percent of the number of times (PERCENT) # that cpu percent busy was High (>95%) or Very High (100%). Pay attention # to high spanning multiple snaps as this represents the number of times cpu # percent busy remained high in back to back snapshots NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>95%) 5 2.34 Very High (100%) 4 1.87 High spanning multiple snaps 2 0.93 CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy: SnapTime ------------------------------ Sat Mar 22 09:55:09 UTC 2014 Sat Mar 22 09:55:48 UTC 2014 Sat Mar 22 09:58:09 UTC 2014 --> CPU utilization is too high before Node Eviction occurs at 10:03 We can't say anything about CPU usage at Evicition time but it can be expected that CPUs usage remains high for the missing OSWatcher monitor records ############################################################################ # Section 4.3:CPU UTILIZATION: PERCENT SYS # CPU utilization running in SYSTEM mode should not be greater than 30% over # long periods of time. The higher system cpu utilization the longer it will # take processes to run. Pay attention to high spanning multiple snaps as it # is important that cpu utilization not stay persistently high (>30%) # NUMBER PERCENT Snaps captured in archive 28 100.00 High (>30%) 5 17.86 Very High (50%) 2 7.14 High spanning multiple snaps 1 3.57 High values for SYSTEM Mode ( > 30% ) could be related to - High Paging/Swapping activities - High Disk or Network I/O - Wild running processes running a lot of system calls CPU UTILIZATION: The following snaps recorded very high percent SnapTime Percent ----------------------------------- Sat Mar 22 09:54:34 PDT 2014 53 Sat Mar 22 09:56:33 PDT 2014 59 CPU UTILIZATION: The following snaps recorded ROOT processes using high percent cpu: SnapTime Pid CPU Command ----------------------------------------------------- Sat Mar 22 09:47:33 UTC 2014 2867 94.8 mp_stress Sat Mar 22 09:48:03 UTC 2014 3554 91.4 mp_stress Sat Mar 22 09:48:37 UTC 2014 3554 42.8 mp_stress Sat Mar 22 09:49:32 UTC 2014 4738 37.1 tfactl.pl Sat Mar 22 09:49:32 UTC 2014 4946 35.1 tfactl.pl Sat Mar 22 09:55:11 UTC 2014 14181 328.9 mp_stress Sat Mar 22 09:55:59 UTC 2014 14181 104.6 mp_stress Sat Mar 22 09:57:04 UTC 2014 16174 219.0 mp_stress Sat Mar 22 09:57:34 UTC 2014 16805 52.4 tfactl.pl Sat Mar 22 10:12:36 UTC 2014 28518 66.5 tfactl.pl -> Process mp_stress is eating up our CPU

Analyzing Memory Usage

############################################################################ # Section 5.3: MEMORY PAGE IN # Page in values should be 0 or low. High values (> 25) indicate memory is # under pressure and may be precursor to swapping. Pay attention to high # spanning multiple snaps as this value should not stay persistently high # NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>25) 31 14.49 High spanning multiple snaps 19 8.88 The following snaps recorded very high page in rates: SnapTime Value ----------------------------------- Sat Mar 22 09:51:33 UTC 2014 32 Sat Mar 22 09:54:34 UTC 2014 312 Sat Mar 22 09:55:09 UTC 2014 32 Sat Mar 22 09:56:33 UTC 2014 624 Sat Mar 22 09:57:04 UTC 2014 352 Sat Mar 22 09:57:34 UTC 2014 664 Sat Mar 22 09:58:09 UTC 2014 128 Sat Mar 22 10:09:35 UTC 2014 292 -> Paging is too high for 15 % of our snapshots before Node Eviction occurs at 10:03 #################################################################################################################################### Section 5.5: Top 5 Memory Consuming Processes Beginning # This section list the top 5 memory consuming processes at the start of the oswbba analysis. There will always be a top 5 process list. # A process listed here does not imply this process is a problem only that it is a top consumer of memory. SnapTime PID USER %CPU %MEM VSZ RSS COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:00:52 UTC 2014 2566 root 0.40 6.20 1798816 273796 ../ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home Sat Mar 22 09:00:52 UTC 2014 27215 oracle 0.00 4.30 1663316 187352 ora_dbw0_grac41 Sat Mar 22 09:00:52 UTC 2014 27131 oracle 0.50 3.90 1569328 171356 ora_lms0_grac41 Sat Mar 22 09:00:52 UTC 2014 5661 root 2.90 3.80 981288 168316 /u01/app/11204/grid/bin/ologgerd -M -d . /grac41 Sat Mar 22 09:00:52 UTC 2014 27221 oracle 0.00 3.20 1564988 143556 ora_smon_grac41 #################################################################################################################################### Section 5.6: Top 5 Memory Consuming Processes Ending # This section list the top 5 memory consuming processes at the end of the oswbba analysis. There will always be a top 5 process list. # A process listed here does not imply this process is a problem only that it is a top consumer of memory. SnapTime PID USER %CPU %MEM VSZ RSS COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 10:59:49 UTC 2014 2566 root 0.40 4.70 1798816 207060 .. /ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home Sat Mar 22 10:59:49 UTC 2014 5661 root 3.00 3.90 1047852 170232 /u01/app/11204/grid/bin/ologgerd -M -d ../grac41 Sat Mar 22 10:59:49 UTC 2014 22565 oracle 0.00 3.10 1554224 135496 ora_mman_grac41 Sat Mar 22 10:59:49 UTC 2014 5283 grid 6.20 2.90 1128680 127744 /u01/app/11204/grid/bin/ocssd.bin Sat Mar 22 10:59:49 UTC 2014 22578 oracle 0.00 2.60 1560896 114060 ora_smon_grac4 --> Be carefull here as our top consumer process mp_stress is not shown as the process was later started and also preempts stops reaching the oswbba end period Always check section 8 for process related results !

Analyzing Disk IO

############################################################################ # Section 6: Disk Detailed Findings # This section list only those device which have high percent busy, high service # times or high wait times # ############################################################################ # Section 6.1: Disk Percent Busy Findings # (Only Devices With Percent Busy > 50% Reported) # DEVICE: sda PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 21 9.81 Very High (>95%) 17 7.94 High spanning multiple snaps 14 6.54 The following snaps recorded high percent busy for device: sda SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 77.09 Sat Mar 22 09:49:32 UTC 2014 98.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 100.0 Sat Mar 22 09:51:03 UTC 2014 99.5 DEVICE: dm-0 PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 17 7.94 Very High (>95%) 9 4.21 High spanning multiple snaps 9 4.21 The following snaps recorded high percent busy for device: dm-0 ( our swap device ) SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 67.09 Sat Mar 22 09:49:32 UTC 2014 98.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 82.2 Sat Mar 22 09:51:03 UTC 2014 77.2 DEVICE: dm-1 PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 17 7.94 Very High (>95%) 16 7.48 High spanning multiple snaps 14 6.54 The following snaps recorded high percent busy for device: dm-1 SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 77.01 Sat Mar 22 09:49:32 UTC 2014 88.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 99.9 Sat Mar 22 09:51:03 UTC 2014 93.9 Here we need to know something about our partition layout For details check: http://www.hhutzler.de/blog/how-are-logical-volumes-like-devdm-0-mapped-to-devsdx-diskspartitions/" target="_blank" title="How are logical volumes like /dev/dm-0 mapped to /dev/sdx disks/partitions ?"> following link. # dmsetup ls --tree -o device vg_oel64-lv_swap (252:1) +- (8:3) <-- Major, Minor number from /dev/sdX +- (8:2) vg_oel64-lv_root (252:0) +- (8:2) Check /dev/mapper # ls -l /dev/mapper/vg* lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_root -> ../dm-0 lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_swap -> ../dm-1 Match major/minor number returned from above dmsetup output # ls -l /dev/sda2 /dev/sda3 brw-rw----. 1 root disk 8, 2 Mar 24 09:07 /dev/sda2 brw-rw----. 1 root disk 8, 3 Mar 24 09:07 /dev/sda3 --> Root partition and Swap partition are pointing to the same physical disk /dev/sda -> I/O contention For our swap partition we see high BUSY rates > 90 % around 09:50 -> Increased paging/swapping

Analyzing Processes

a ############################################################################ # Section 8.2: PS for Processes With Status = D, T or W Ordered By Time # In this section list all processes captured in the oswbb logs which have a # status of D, T or W # SnapTime PID USER CPU STATUS WCHAN COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:49:32 PDT 2014 7573 grid 0.0 D sleep_ asm_rbal_+ASM1 Sat Mar 22 09:49:32 PDT 2014 31115 oracle 0.0 D sleep_ ora_cjq0_grac41 Sat Mar 22 09:49:32 PDT 2014 27487 oracle 0.0 D sleep_ ora_lck0_grac41 Sat Mar 22 09:49:32 PDT 2014 4915 root 0.0 D sleep_ /u01/app/11204/grid/bin/./crsctl.bin stat res procwatcher Sat Mar 22 09:49:32 PDT 2014 27213 oracle 0.0 D sleep_ ora_mman_grac41 Sat Mar 22 09:49:32 PDT 2014 23293 oracle 0.0 D sleep_ ora_pz99_grac41a ... --> At lot of processes are in disk wait status For 2.6 kernels this could be either an IO problem or more likely a paging/swapping problem ####################################################################################### # Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time # In this section list all processes captured in the oswbb logs with process cpu consumption # > 0 and system idle cpu < 30% # SnapTime IDLE_CPU PID USER CPU STATUS COMMAND ---------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:11 UTC 2014 0.0 14181 root 328.90 S mp_stress Sat Mar 22 09:55:59 UTC 2014 0.0 14181 root 104.60 S mp_stress Sat Mar 22 09:57:04 UTC 2014 9.0 16174 root 219.00 S mp_stress -> process mp_stress is taking a lot of CPU - there is no IDLE CPU from 09:55:11 on ####################################################################################### # Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot # In this section list all changes in virtual memory allocations per process # SnapTime PID USER %CPU %MEM VSZ CHANGE %CHANGE COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:59 UTC 2014 14181 root 205.00 18.50 1090096 +630036 +136.94 ./mp_stress -t 4 -m 5 -p 50 -c 50 Sat Mar 22 09:56:33 UTC 2014 14181 root 165.00 22.40 1263176 +173080 +15.87 ./mp_stress -t 4 -m 5 -p 50 -c 50 --> Virtual memory for process ./mp_stress is increasing a lot and + CPU usage is also very high Increased CPU usage and Memory usage could be the root cause for a Node Eviction ! ####################################################################################### Section 8.5: Top RSS Processes Increasing Memory Per Snapshot # In this section list all changes in resident memory allocations per process # SnapTime PID USER %CPU %MEM RSS CHANGE %CHANGE COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:59 UTC 2014 14181 root 205.00 18.50 805984 +630016 +358.02 ./mp_stress -t 4 -m 5 -p 50 -c 50 Sat Mar 22 09:56:33 UTC 2014 14181 root 165.00 22.40 977540 +171556 +21.28 ./mp_stress -t 4 -m 5 -p 50 -c 50 --> Resident memory for process ./mp_stress increases a lot Problem could be either a Memory leak or things like a connection storm

Using grep to retrieve process priority from OSWatcher raw data

% egrep 'zzz|mp_stress|PRI' grac41.example.com_ps_14.03.22.0900.dat USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:56:33 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND root 14181 4270 90 165 22.4 1263176 977540 n_tty_ S 09:55:02 00:02:32 ./mp_stress -t 4 -m 5 -p 50 -c 50 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:57:04 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:57:34 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 10:00:17 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND root 16867 4270 90 108 68.3 4542216 2974540 futex_ S 09:57:38 00:03:44 ./mp_stress -t 4 -m 20 -p 50 -c 200 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND ==> Priority is quite high 90 ( true as mp_stress is a RT process ) CPU usage is high too Memory usage explodes from 22 % to 698 %

Summary

process mp_stress leaks memory and eats up all our CPU and is very likely the root cause of the problem
System is paging and lot of process are waiting on Disk I/O
CPU queue is high – after a while the most process migrate to the blocking queue
CPU usage is high – all the time
As all I/O is redirected to single physical disk we see high disk service time and disk waits
From the provided OSWatcher data we can’t pin point the root cause of the Node Eviction
Root cause could be either : CPU starvation, Paging/Swapping or slow disk I/O

Khmer System Administrative

Pages

Tuesday, January 26, 2021

OSWatcher