Install OSWatcher
- Download OSWatcher from OTN and Untar oswatcher ( see Note Doc ID 301137.1 )
- Untar the reladet TAR archive : tar xvf oswbb601.tar
- Before starting OSWatcher is checking for a process with OSWatcher string in its full path .
- Don’t install OSWatcher in a directory named OSWatcher and use full PATH to start the tool. Even running a gedit session ( $ gedit OSWatcher.dat ) will signal that OSWatcher is running and you can’t start OSWatcher
- In short : ps -e | grep OSWatch should not return any results before starting OSWatcher
# tar xvf oswbb601.tar oswbb/ oswbb/src/ oswbb/src/tombody.gif oswbb/src/Thumbs.db oswbb/src/missing_graphic.gif
Create file private.net for monitoring Cluster interconnet
Create file private.net based on Exampleprivate.net - here is the Linux Version
#####################################################################
# This file contains examples of how to monitor private networks. To
# monitor your private networks create an executable file in this same
# directory named private.net. Use the example for your host os below.
# Make sure not to remove the last line in this file. Your file
# private.net MUST contain the rm lock.file line.
######################################################################
#Linux Example
######################################################################
echo "zzz ***"`date`
traceroute -r -F grac1int.example.com
traceroute -r -F grac2int.example.com
traceroute -r -F grac3int.example.com
######################################################################
rm locks/lock.file
Debugging a Node Eviction using OSWatcher
Overview
- Note as OSWatcher is not running with RT privs ( like CHM ) – we may miss a lot of interesting records
- OSWatcher utils ( vmstat , iostat , ..) may not get scheduled if we have CPU / Paging/ Swapping problems
- Always check the OSWatcher vmstat file for missing records
- If we have missing records for our Eviction Time we can only look in the past before eviction time
- Always check CHM data as here we get much more details about our system status during the Eviction Time
- Use the graphical tool of OSWatcher to check for high count of blocking process
Create an OSWatcher Analyzer report
Does OSWatcher provide enough data to analyze the problem ?
% grep zzz grac41.example.com_vmstat_14.03.22.0900.dat .... zzz ***Sat Mar 22 09:57:04 CET 2014 zzz ***Sat Mar 22 09:57:34 CET 2014 zzz ***Sat Mar 22 09:58:09 CET 2014 % grep zzz grac41.example.com_vmstat_14.03.22.1000.dat zzz ***Sat Mar 22 10:09:35 CET 2014 zzz ***Sat Mar 22 10:10:05 CET 2014 zzz ***Sat Mar 22 10:10:35 CET 2014 zzz ***Sat Mar 22 10:11:05 CET 2014- We don’t have enough data during the eviction so we may not able to find the root cause of the problem
- OSWatcher records missing from 09:58:09 – 10:09:35
Read and interpret OSWatcher Analyzer Data
Analyzing System Status
############################################################################ # Section 1: System Status # # This section lists the status of each major subsystem. Status values are: # Critical: The subsystem requires immediate attention # Warning: The subsystem detailed findings should be reviewed # OK: The subsystem was found to be okay # Unknown: The status of the subsystem could not be determined # # Subsystem Status ------------------------ CPU CRITICAL MEMORY WARNING I/O WARNING NET WARNING --> Need to review all subsytems ############################################################################ # Section 2.0: System Slowdown Summary Ordered By Impact # # This section lists the times when the OS started to slowdown. oswbba is # able to measure this by looking at the timestamps in the individual files # it collects. It compares the time between the snapshots and looks to see # how this time differs from the expected timestamp which will be the oswbb # $ Snapshot Freq value listed at the top of this file. Any slowdowns listed # in this section will be ordered by the slowdown Secs column.The subsystem # most likely responsible for the slowdown will be identified here. # SnapTime Variance Secs Flags SubSystem ----------------------------------------------------------------- Sat Mar 22 09:56:33 1.5 45 0020-00-01 Memory Sat Mar 22 09:55:48 1.3 39 2200-00-00 CPU Sat Mar 22 09:55:09 1.1 35 2200-00-00 Memory Sat Mar 22 09:58:09 1.1 35 2200-00-01 Memory --> Both CPU and Memory problem are reported as root cause for system slowdown Report Summary SnapTime Variance Secs Flags Cause(Most Likely) ----------------------------------------------------------------- Sat Mar 22 09:58:09 1.1 35 2200-30-01 1: System paging memory 2: Large Run Queue >>>Looking for cause of problem 1: System paging memory Advise: The OS is paging memory. Reasons: 1. The system is under stress with respect to memory >>>Looking for cause of problem 2: Large Run Queue Advise: Check why run queue is so large PERCENT Reasons: 1. Possible login storm 2. Possible mutex issue in database (Examine AWR) --> Above reports confirms that CPU run queue is large and System is paging ############################################################################ # Section 3: System General Findings # # This section lists all general findings that require attention. Each # finding has a status along with a subsystem. Further advice may also # available regarding the finding. # CRITICAL: CPU Run Queue observed very high spikes. Advise: Check why run queue is so large. Check: The number of processes for possible login storm Check: AWR for possible mutex issue in database (Examine AWR) CRITICAL: CPU Running in System Mode observed to be high. Advise: Check why large amount of cpu is running in kernel mode. Check: Output of top command top to see what processes are running and using kernel cpu Check: If the system is undersized with respect to CPU capacity WARNING: Memory high paging rate observed. Advise: The OS is low on free memory. Check: The system is under stress with respect to memory WARNING : Disk heavy utilization observed. Advise: Check disks to see why utilization is so high. Check: Hot disk: I/O distribution should be evaluated Check: The system is undersized with respect to I/O capacity Check: AWR for SQL regression causing more I/O WARNING : Disk high service time observed. Advise: Check disks to see why service time is so high. Check: Hot disk: I/O distribution should be evaluated Check: Disk may be defective WARNING : Network UDP errors observed. Advise: UDP protocol only relevant for RAC. Ignore for Non-RAC Advise: Avoid any dropped packets in UDP protocol Check: UDP socket receive buffer on the local machine too small Check: The application not reading the data fast enough Check: Section 7.3 below for more detailsAnalyzing CPU data
############################################################################ # Section 4.1: CPU RUN QUEUE: # Run queue should not exceed (Value/#CPU > 3) for any long period of time. # Below lists the number of times (NUMBER) and percent of the number of times # (PERCENT) that run queue was High (>3) or Very High (>6). Pay attention to # high spanning multiple snaps as this represents the number of times ru PERCENT # queue remained high in back to back snapshots # ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>3) 12 5.61 Very High (>6) 7 3.27 High spanning multiple snaps 3 1.4 The following snaps recorded very high run queue values: SnapTime Value Value/#CPU ------------------------------------------------ Sat Mar 22 09:55:09 UTC 2014 29 14 Sat Mar 22 09:55:48 UTC 2014 20 10 Sat Mar 22 09:57:04 UTC 2014 117 58 Sat Mar 22 09:58:09 UTC 2014 45 22 --> At 09:57:04 58 process per CPU are waiting - this is way to much ############################################################################ # Section 4.2: CPU UTILIZATION: PERCENT BUSY # CPU utilization should not be high over long periods of time. The higher # the cpu utilization the longer it will take processes to run. Below lists # the number of times (NUMBER) and percent of the number of times (PERCENT) # that cpu percent busy was High (>95%) or Very High (100%). Pay attention # to high spanning multiple snaps as this represents the number of times cpu # percent busy remained high in back to back snapshots NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>95%) 5 2.34 Very High (100%) 4 1.87 High spanning multiple snaps 2 0.93 CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy: SnapTime ------------------------------ Sat Mar 22 09:55:09 UTC 2014 Sat Mar 22 09:55:48 UTC 2014 Sat Mar 22 09:58:09 UTC 2014 --> CPU utilization is too high before Node Eviction occurs at 10:03 We can't say anything about CPU usage at Evicition time but it can be expected that CPUs usage remains high for the missing OSWatcher monitor records ############################################################################ # Section 4.3:CPU UTILIZATION: PERCENT SYS # CPU utilization running in SYSTEM mode should not be greater than 30% over # long periods of time. The higher system cpu utilization the longer it will # take processes to run. Pay attention to high spanning multiple snaps as it # is important that cpu utilization not stay persistently high (>30%) # NUMBER PERCENT Snaps captured in archive 28 100.00 High (>30%) 5 17.86 Very High (50%) 2 7.14 High spanning multiple snaps 1 3.57 High values for SYSTEM Mode ( > 30% ) could be related to - High Paging/Swapping activities - High Disk or Network I/O - Wild running processes running a lot of system calls CPU UTILIZATION: The following snaps recorded very high percent SnapTime Percent ----------------------------------- Sat Mar 22 09:54:34 PDT 2014 53 Sat Mar 22 09:56:33 PDT 2014 59 CPU UTILIZATION: The following snaps recorded ROOT processes using high percent cpu: SnapTime Pid CPU Command ----------------------------------------------------- Sat Mar 22 09:47:33 UTC 2014 2867 94.8 mp_stress Sat Mar 22 09:48:03 UTC 2014 3554 91.4 mp_stress Sat Mar 22 09:48:37 UTC 2014 3554 42.8 mp_stress Sat Mar 22 09:49:32 UTC 2014 4738 37.1 tfactl.pl Sat Mar 22 09:49:32 UTC 2014 4946 35.1 tfactl.pl Sat Mar 22 09:55:11 UTC 2014 14181 328.9 mp_stress Sat Mar 22 09:55:59 UTC 2014 14181 104.6 mp_stress Sat Mar 22 09:57:04 UTC 2014 16174 219.0 mp_stress Sat Mar 22 09:57:34 UTC 2014 16805 52.4 tfactl.pl Sat Mar 22 10:12:36 UTC 2014 28518 66.5 tfactl.pl -> Process mp_stress is eating up our CPUAnalyzing Memory Usage
############################################################################ # Section 5.3: MEMORY PAGE IN # Page in values should be 0 or low. High values (> 25) indicate memory is # under pressure and may be precursor to swapping. Pay attention to high # spanning multiple snaps as this value should not stay persistently high # NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>25) 31 14.49 High spanning multiple snaps 19 8.88 The following snaps recorded very high page in rates: SnapTime Value ----------------------------------- Sat Mar 22 09:51:33 UTC 2014 32 Sat Mar 22 09:54:34 UTC 2014 312 Sat Mar 22 09:55:09 UTC 2014 32 Sat Mar 22 09:56:33 UTC 2014 624 Sat Mar 22 09:57:04 UTC 2014 352 Sat Mar 22 09:57:34 UTC 2014 664 Sat Mar 22 09:58:09 UTC 2014 128 Sat Mar 22 10:09:35 UTC 2014 292 -> Paging is too high for 15 % of our snapshots before Node Eviction occurs at 10:03 #################################################################################################################################### Section 5.5: Top 5 Memory Consuming Processes Beginning # This section list the top 5 memory consuming processes at the start of the oswbba analysis. There will always be a top 5 process list. # A process listed here does not imply this process is a problem only that it is a top consumer of memory. SnapTime PID USER %CPU %MEM VSZ RSS COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:00:52 UTC 2014 2566 root 0.40 6.20 1798816 273796 ../ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home Sat Mar 22 09:00:52 UTC 2014 27215 oracle 0.00 4.30 1663316 187352 ora_dbw0_grac41 Sat Mar 22 09:00:52 UTC 2014 27131 oracle 0.50 3.90 1569328 171356 ora_lms0_grac41 Sat Mar 22 09:00:52 UTC 2014 5661 root 2.90 3.80 981288 168316 /u01/app/11204/grid/bin/ologgerd -M -d . /grac41 Sat Mar 22 09:00:52 UTC 2014 27221 oracle 0.00 3.20 1564988 143556 ora_smon_grac41 #################################################################################################################################### Section 5.6: Top 5 Memory Consuming Processes Ending # This section list the top 5 memory consuming processes at the end of the oswbba analysis. There will always be a top 5 process list. # A process listed here does not imply this process is a problem only that it is a top consumer of memory. SnapTime PID USER %CPU %MEM VSZ RSS COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 10:59:49 UTC 2014 2566 root 0.40 4.70 1798816 207060 .. /ojdbc6.jar oracle.rat.tfa.TFAMain ../grac41/tfa_home Sat Mar 22 10:59:49 UTC 2014 5661 root 3.00 3.90 1047852 170232 /u01/app/11204/grid/bin/ologgerd -M -d ../grac41 Sat Mar 22 10:59:49 UTC 2014 22565 oracle 0.00 3.10 1554224 135496 ora_mman_grac41 Sat Mar 22 10:59:49 UTC 2014 5283 grid 6.20 2.90 1128680 127744 /u01/app/11204/grid/bin/ocssd.bin Sat Mar 22 10:59:49 UTC 2014 22578 oracle 0.00 2.60 1560896 114060 ora_smon_grac4 --> Be carefull here as our top consumer process mp_stress is not shown as the process was later started and also preempts stops reaching the oswbba end period Always check section 8 for process related results !Analyzing Disk IO
############################################################################ # Section 6: Disk Detailed Findings # This section list only those device which have high percent busy, high service # times or high wait times # ############################################################################ # Section 6.1: Disk Percent Busy Findings # (Only Devices With Percent Busy > 50% Reported) # DEVICE: sda PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 21 9.81 Very High (>95%) 17 7.94 High spanning multiple snaps 14 6.54 The following snaps recorded high percent busy for device: sda SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 77.09 Sat Mar 22 09:49:32 UTC 2014 98.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 100.0 Sat Mar 22 09:51:03 UTC 2014 99.5 DEVICE: dm-0 PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 17 7.94 Very High (>95%) 9 4.21 High spanning multiple snaps 9 4.21 The following snaps recorded high percent busy for device: dm-0 ( our swap device ) SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 67.09 Sat Mar 22 09:49:32 UTC 2014 98.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 82.2 Sat Mar 22 09:51:03 UTC 2014 77.2 DEVICE: dm-1 PERCENT BUSY NUMBER PERCENT ------------------------------------------------------ Snaps captured in archive 214 100.00 High (>50%) 17 7.94 Very High (>95%) 16 7.48 High spanning multiple snaps 14 6.54 The following snaps recorded high percent busy for device: dm-1 SnapTime Value ------------------------------------------- Sat Mar 22 09:48:36 UTC 2014 77.01 Sat Mar 22 09:49:32 UTC 2014 88.7 Sat Mar 22 09:50:02 UTC 2014 99.21 Sat Mar 22 09:50:32 UTC 2014 99.9 Sat Mar 22 09:51:03 UTC 2014 93.9 Here we need to know something about our partition layout For details check: http://www.hhutzler.de/blog/how-are-logical-volumes-like-devdm-0-mapped-to-devsdx-diskspartitions/" target="_blank" title="How are logical volumes like /dev/dm-0 mapped to /dev/sdx disks/partitions ?"> following link. # dmsetup ls --tree -o device vg_oel64-lv_swap (252:1) +- (8:3) <-- Major, Minor number from /dev/sdX +- (8:2) vg_oel64-lv_root (252:0) +- (8:2) Check /dev/mapper # ls -l /dev/mapper/vg* lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_root -> ../dm-0 lrwxrwxrwx. 1 root root 7 Mar 24 09:07 /dev/mapper/vg_oel64-lv_swap -> ../dm-1 Match major/minor number returned from above dmsetup output # ls -l /dev/sda2 /dev/sda3 brw-rw----. 1 root disk 8, 2 Mar 24 09:07 /dev/sda2 brw-rw----. 1 root disk 8, 3 Mar 24 09:07 /dev/sda3 --> Root partition and Swap partition are pointing to the same physical disk /dev/sda -> I/O contention For our swap partition we see high BUSY rates > 90 % around 09:50 -> Increased paging/swappingAnalyzing Processes
a ############################################################################ # Section 8.2: PS for Processes With Status = D, T or W Ordered By Time # In this section list all processes captured in the oswbb logs which have a # status of D, T or W # SnapTime PID USER CPU STATUS WCHAN COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:49:32 PDT 2014 7573 grid 0.0 D sleep_ asm_rbal_+ASM1 Sat Mar 22 09:49:32 PDT 2014 31115 oracle 0.0 D sleep_ ora_cjq0_grac41 Sat Mar 22 09:49:32 PDT 2014 27487 oracle 0.0 D sleep_ ora_lck0_grac41 Sat Mar 22 09:49:32 PDT 2014 4915 root 0.0 D sleep_ /u01/app/11204/grid/bin/./crsctl.bin stat res procwatcher Sat Mar 22 09:49:32 PDT 2014 27213 oracle 0.0 D sleep_ ora_mman_grac41 Sat Mar 22 09:49:32 PDT 2014 23293 oracle 0.0 D sleep_ ora_pz99_grac41a ... --> At lot of processes are in disk wait status For 2.6 kernels this could be either an IO problem or more likely a paging/swapping problem ####################################################################################### # Section 8.3: PS for (Processes with CPU > 0) When System Idle CPU < 30% Ordered By Time # In this section list all processes captured in the oswbb logs with process cpu consumption # > 0 and system idle cpu < 30% # SnapTime IDLE_CPU PID USER CPU STATUS COMMAND ---------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:11 UTC 2014 0.0 14181 root 328.90 S mp_stress Sat Mar 22 09:55:59 UTC 2014 0.0 14181 root 104.60 S mp_stress Sat Mar 22 09:57:04 UTC 2014 9.0 16174 root 219.00 S mp_stress -> process mp_stress is taking a lot of CPU - there is no IDLE CPU from 09:55:11 on ####################################################################################### # Section 8.4: Top VSZ Processes Increasing Memory Per Snapshot # In this section list all changes in virtual memory allocations per process # SnapTime PID USER %CPU %MEM VSZ CHANGE %CHANGE COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:59 UTC 2014 14181 root 205.00 18.50 1090096 +630036 +136.94 ./mp_stress -t 4 -m 5 -p 50 -c 50 Sat Mar 22 09:56:33 UTC 2014 14181 root 165.00 22.40 1263176 +173080 +15.87 ./mp_stress -t 4 -m 5 -p 50 -c 50 --> Virtual memory for process ./mp_stress is increasing a lot and + CPU usage is also very high Increased CPU usage and Memory usage could be the root cause for a Node Eviction ! ####################################################################################### Section 8.5: Top RSS Processes Increasing Memory Per Snapshot # In this section list all changes in resident memory allocations per process # SnapTime PID USER %CPU %MEM RSS CHANGE %CHANGE COMMAND ----------------------------------------------------------------------------------------------------------------------------------- Sat Mar 22 09:55:59 UTC 2014 14181 root 205.00 18.50 805984 +630016 +358.02 ./mp_stress -t 4 -m 5 -p 50 -c 50 Sat Mar 22 09:56:33 UTC 2014 14181 root 165.00 22.40 977540 +171556 +21.28 ./mp_stress -t 4 -m 5 -p 50 -c 50 --> Resident memory for process ./mp_stress increases a lot Problem could be either a Memory leak or things like a connection storm
Using grep to retrieve process priority from OSWatcher raw data
% egrep 'zzz|mp_stress|PRI' grac41.example.com_ps_14.03.22.0900.dat USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:56:33 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND root 14181 4270 90 165 22.4 1263176 977540 n_tty_ S 09:55:02 00:02:32 ./mp_stress -t 4 -m 5 -p 50 -c 50 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:57:04 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 09:57:34 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Sat Mar 22 10:00:17 CET 2014 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND root 16867 4270 90 108 68.3 4542216 2974540 futex_ S 09:57:38 00:03:44 ./mp_stress -t 4 -m 20 -p 50 -c 200 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND ==> Priority is quite high 90 ( true as mp_stress is a RT process ) CPU usage is high too Memory usage explodes from 22 % to 698 %
Summary
- process mp_stress leaks memory and eats up all our CPU and is very likely the root cause of the problem
- System is paging and lot of process are waiting on Disk I/O
- CPU queue is high – after a while the most process migrate to the blocking queue
- CPU usage is high – all the time
- As all I/O is redirected to single physical disk we see high disk service time and disk waits
- From the provided OSWatcher data we can’t pin point the root cause of the Node Eviction
- Root cause could be either : CPU starvation, Paging/Swapping or slow disk I/O
No comments:
Post a Comment