User Tools

Site Tools


development:monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

development:monitoring [2019/10/18 10:44]
meesters created
development:monitoring [2019/10/18 10:46] (current)
meesters
Line 14: Line 14:
  
 Without futher ado, here is the top link to enter our ganglia page: [[http://​m2monitor/​ganglia/​|Ganglia Entry point for the Mogon Clusters]]. Without futher ado, here is the top link to enter our ganglia page: [[http://​m2monitor/​ganglia/​|Ganglia Entry point for the Mogon Clusters]].
 +
 +===== Shell Tools for Monitoring Purposes =====
 +
 +==== CPU Monitoring with (h)top ====
 +
 +''​top''​ is the classical tool to monitor CPU behavior of your process, relatively fine grained. As user you are allowed to log in (with ''​ssh''​) into nodes where jobs of yours are running.** Remember to log out afterwards.
 +**
 +
 +An example is
 +<code bash>
 +top -u <​username>​
 +</​code>​
 +
 +Specifiying the username allows to limit the view on own processes.
 +
 +==== Virtual Memory Statistics ====
 +
 +The ''​vmstat''​ command allows to display statistics of virtual memory, kernerl threads, disks, system processes, I/O blocks, interrupts, CPU activity and much more. This is a good [[http://​www.tecmint.com/​linux-performance-monitoring-with-vmstat-and-iostat-commands/​|example page]].
 +
 +==== Listing Open Files ====
 +
 +The ''​lsof''​ command can list processes and their open files. In this list included are disk files, network sockets, pipes, devices and processes.
 +
 +One example would be
 +<code bash>
 +$ lsof | head
 +COMMAND ​   PID      USER   ​FD ​     TYPE     ​DEVICE ​ SIZE/​OFF ​      NODE NAME
 +init         ​1 ​     root  cwd      DIR      253,0      4096          2 /
 +init         ​1 ​     root  rtd      DIR      253,0      4096          2 /
 +init         ​1 ​     root  txt      REG      253,0    145180 ​    ​147164 /sbin/init
 +init         ​1 ​     root  mem      REG      253,0   ​1889704 ​    ​190149 /​lib/​libc-2.12.so
 +init         ​1 ​     root   ​0u ​     CHR        1,3       ​0t0 ​      3764 /dev/null
 +init         ​1 ​     root   ​1u ​     CHR        1,3       ​0t0 ​      3764 /dev/null
 +init         ​1 ​     root   ​2u ​     CHR        1,3       ​0t0 ​      3764 /dev/null
 +init         ​1 ​     root   ​3r ​    ​FIFO ​       0,8       ​0t0 ​      8449 pipe
 +init         ​1 ​     root   ​4w ​    ​FIFO ​      ​0,​8 ​      ​0t0 ​      8449 pipe
 +init         ​1 ​     root   ​5r ​     DIR       ​0,​10 ​        ​0 ​         1 inotify
 +init         ​1 ​     root   ​6r ​     DIR       ​0,​10 ​        ​0 ​         1 inotify
 +init         ​1 ​     root   ​7u ​    unix 0xc1513880 ​      ​0t0 ​      8450 socket
 +init          1     ​root ​ DEL       ​REG ​               8,2             ​2621484 /​lib64/​librt-2.12.so
 +</​code>​
 +
 +Here FD stands for 'file descriptor',​ some of the values are:
 +| cwd   | current working directory ​     |
 +| rtd   | root directory ​                |
 +| txt   | program text (code and data)   |
 +| mem   | memory-mapped file             |
 +
 +Also in the FD column numbers like 1u are actual file descriptors and followed by u,r,w of it’s mode as:
 +
 +| r |             for read access. |
 +| w |            for write access. |
 +| u |   for read and write access. |
 +
 +TYPE – of files and it’s identification.
 +
 +|    DIR |                 ​Directory |
 +|    REG |              Regular file |
 +|    CHR |   ​Character special file. |
 +|   FIFO |        First In First Out |
 +
 +
 +
 +<code bash>
 +# to list all files of a particular user and all network connections,​ type:
 +lsof -u <​username>​ -i
 +</​code>​
 +
 +==== IO Statistics ====
 +
 +I/O Statistics is a little intricate in conjunction with parallel file systems. If you have the need to retrieve detailed I/O statistics for the parallel file system, please do not hesitate to contact the HPC-team.
 +
 +Hoever, ''​iostat''​ is simple tool that will collect and show system input and output storage device statistics. This tool is often used to trace storage device performance issues including devices, local disks, remote disks. It is particularly useful if your job requires local scratch storage and you need to monitor your applicatoin working on it.
 +
 +Invoke ​
 +<code bash>
 +iostat -d
 +</​code>​
 +for such a statistic.
 +
 +
 +
 +
 +
 +
 +
 +
development/monitoring.txt · Last modified: 2019/10/18 10:46 by meesters