Monday, July 23, 2012

Rebuilding RPMDB (RPM Database) - open rpm file handles

Following with the output, there has been a process being initiated by cron.daily since may 9th and not finishing the execution. The files in SOSREPORT help troubleshoot the issue if you are troubleshooting remotely. [root@BUTWIS01 ~]# ps -ef|grep rpm root 1225 6420 0 Jul12 ? 00:00:00 /bin/sh /etc/cron.daily/rpm root 1233 1225 0 Jul12 ? 00:00:00 /usr/lib/rpm/rpmq -q --all --qf %{name}-%{version}-%{release}.%{arch}.rpm\n root 1353 6255 0 Jun05 ? 00:00:00 /bin/sh /etc/cron.daily/rpm root 1359 1353 0 Jun05 ? 00:00:00 /usr/lib/rpm/rpmq -q --all --qf %{name}-%{version}-%{release}.%{arch}.rpm\n root 1997 6451 0 May21 ? 00:00:00 /bin/sh /etc/cron.daily/rpm root 2000 1997 0 May21 ? 00:00:00 /usr/lib/rpm/rpmq -q --all --qf %{name}-%{version}-%{release}.%{arch}.rpm\n
......
......



The problem appears to have begun on certain date (lets say May 9th); that's the earliest log entry, and /var/log/rpmpkgs is a 0-byte file created on May 10. Unfortunately, we do not seem to have logs stretching back nearly that far on the server, so determining what happened may not be possible. Does the server itself have any 'messages' files in /var/log other than messages and messages.1?

One thing we can see is that each of the temporary files created by the cron job still have open file handles: sort 1360 0 1 unknown /var/log/rpmpkgs.bvNeC1355 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable)
sort 2001 0 1 unknown /var/log/rpmpkgs.AqCXY1999 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 2145 0 1 unknown /var/log/rpmpkgs.CMQBk2143 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 2580 0 1 unknown /var/log/rpmpkgs.unPLs2578 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 3485 0 1 unknown /var/log/rpmpkgs.LbxJe3483 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 4017 0 1 unknown /var/log/rpmpkgs.AiNLk4009 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 6523 0 1 unknown /var/log/rpmpkgs.BEVAC6518 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 6584 0 1 unknown /var/log/rpmpkgs.NfVjU6582 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable) sort 7071 0 1 unknown /var/log/rpmpkgs.blYnX7062 lstat: Resource temporarily unavailable) (stat: Resource temporarily unavailable)

ls -hl /var/log

The fact that all the files still exist in /var/log/messges* suggests this isn't an issue with your storage or filesystem. Unfortunately, the logs still don't go back far enough to tell us what might have happened. The next thing I like to do is collect some data about what the stuck processes are doing. To that end, please run: strace -Tttfvo /tmp/strace.out -p15223 Let that run for 15 seconds or so, then press Ctrl+C and send us /tmp/strace.out.



The strace output shows that rpm process is currently stalled in a futex wait: 15223 14:48:53.284527 futex(0x2ae242d4a6cc, FUTEX_WAIT, 1, NULL <unfinished ...> This usually indicates a problem within rpmdb so perform following steps to rebuild rpmdb. 1, capture current status. # cd /var/lib/rpm # /usr/lib/rpm/rpmdb_stat -CA > /tmp/rpmdb.out 2, Kill the rpm processes by running "killall -9 rpm" 3, back rpmdb then rebuild. # mv /var/lib/rpm/__db.* /tmp # rpm --rebuilddb Running "rpm -qa" or "yum checl-update" should confirm if the rpmdb is back in working state. Read /tmp/rpmdb.out along with the result after you have run above for more insights.

Sunday, July 22, 2012

How-To: Kill a Process Using the 'pidof' Command


If a process hangs and you want to easily kill it, type in a console:

kill -9 $(pidof process_name)

And replace process_name with a currently running process. For example, to kill rpm you would issue the following:

kill -9 $(pidof rpm)

Or awk, as another example:

kill -9 $(pidof awk)

Or

kill -9 $(pidof awk -1)

pidof is a command that finds the process ID (PID) of a given application. What is inside the ( and ) parenthesis is replaced with a certain PID, and the process which has that PID will be killed.

Wednesday, July 11, 2012

Disk Drive - C.P.U - performance measurement on Linux server

(For using tools like iostat, sar[cpu], mpstat[c.p.u usageINFO], nfsiostat[nfsShareStat], nfsstat[nfsShareStat], lpstat[cups-printerStats], vmstat[virtualMemStat])

You need sysstat installed
#yum install sysstat




CURRENT SERVER STATS
=================================
[root@sssl-prime ~]# hdparm -T /dev/sda1
/dev/sda1:
Timing cached reads: 9152 MB in 2.00 seconds = 4581.28 MB/sec

[root@ssssl-prime ~]# hdparm -t /dev/sda1
/dev/sda1:
Timing buffered disk reads: 100 MB in 3.03 seconds = 33.04 MB/sec

[root@ssssl-prime ~]# hdparm -t /dev/sda2
/dev/sda2:
Timing buffered disk reads: 112 MB in 3.04 seconds = 36.85 MB/sec

[root@ssssl-prime ~]# hdparm -t /dev/sda
/dev/sda:
Timing buffered disk reads: 116 MB in 3.00 seconds = 38.62 MB/sec

[root@ssssl-prime ~]# hdparm -t /dev/sda3
/dev/sda3:
Timing buffered disk reads: 108 MB in 3.03 seconds = 35.64 MB/sec


[root@bu 5.8 ~]# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads: 26204 MB in 2.00 seconds = 13126.82 MB/sec
Timing buffered disk reads: 1312 MB in 3.00 seconds = 437.20 MB/sec

[root@bu 6.2 ~]# hdparm -Tt /dev/sda

/dev/sda:
Timing cached reads: 14174 MB in 2.00 seconds = 7095.22 MB/sec
Timing buffered disk reads: 912 MB in 3.00 seconds = 303.88 MB/sec