User Tools

Site Tools


shell_based_monitoring

This is an old revision of the document!


Shell Tools for Monitoring Purposes

CPU Monitoring with top

top is the classical tool to monitor CPU behavior of your process, relatively fine grained. As user you are allowed to log in (with ssh) into nodes where jobs of yours are running. Remember to log out afterwards.

An example is

top -u <username>

Specifiying the username allows to limit the view on own processes.

Virtual Memory Statistics

The vmstat command allows to display statistics of virtual memory, kernerl threads, disks, system processes, I/O blocks, interrupts, CPU activity and much more. This is a good example page.

Listing Open Files

The lsof command can list processes and their open files. In this list included are disk files, network sockets, pipes, devices and processes.

One example would be

$ lsof | head
COMMAND    PID      USER   FD      TYPE     DEVICE  SIZE/OFF       NODE NAME
init         1      root  cwd      DIR      253,0      4096          2 /
init         1      root  rtd      DIR      253,0      4096          2 /
init         1      root  txt      REG      253,0    145180     147164 /sbin/init
init         1      root  mem      REG      253,0   1889704     190149 /lib/libc-2.12.so
init         1      root   0u      CHR        1,3       0t0       3764 /dev/null
init         1      root   1u      CHR        1,3       0t0       3764 /dev/null
init         1      root   2u      CHR        1,3       0t0       3764 /dev/null
init         1      root   3r     FIFO        0,8       0t0       8449 pipe
init         1      root   4w     FIFO       0,8       0t0       8449 pipe
init         1      root   5r      DIR       0,10         0          1 inotify
init         1      root   6r      DIR       0,10         0          1 inotify
init         1      root   7u     unix 0xc1513880       0t0       8450 socket
init          1     root  DEL       REG                8,2             2621484 /lib64/librt-2.12.so

Here FD stands for 'file descriptor', some of the values are:

cwd current working directory
rtd root directory
txt program text (code and data)
mem memory-mapped file

Also in the FD column numbers like 1u are actual file descriptors and followed by u,r,w of it’s mode as:

r for read access.
w for write access.
u for read and write access.

TYPE – of files and it’s identification.

DIR Directory
REG Regular file
CHR Character special file.
FIFO First In First Out
# to list all files of a particular user and all network connections, type:
lsof -u <username> -i

IO Statistics

I/O Statistics is a little intricate in conjunction with parallel file systems. If you have the need to retrieve detailed I/O statistics for the parallel file system, please do not hesitate to contact the HPC-team.

Hoever, iostat is simple tool that will collect and show system input and output storage device statistics. This tool is often used to trace storage device performance issues including devices, local disks, remote disks. It is particularly useful if your job requires local scratch storage and you need to monitor your applicatoin working on it.

Invoke

iostat -d

for such a statistic.

shell_based_monitoring.1449146635.txt.gz · Last modified: 2015/12/03 13:43 by meesters