start:development:analysis_and_optimization

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

start:development:analysis_and_optimization [2020/04/16 19:30] (current)
jrutte02 created
Line 1: Line 1:
 +====== Performance Analysis Tools and Optimization Tools / Aspects ======
  
 +The HPC Group provides a variety of profiling and optimization or performance measurement tools:
 +
 +===== Software Choices =====
 +
 +==== ''​perf''​ ====
 +
 +[[https://​perf.wiki.kernel.org/​index.php/​Main_Page|''​perf''​]] is a command line-based performance analyzing tool in Linux.
 +
 +<callout type="​info"​ icon="​true">​
 +As ''​perf''​ ships with the linux kernel, there are no modules. You can use it any time (interactively,​ within or without jobs).
 +</​callout>​
 +
 +=== Documentation ===
 +
 +This wiki does not aim to reproduce the [[https://​perf.wiki.kernel.org/​index.php/​Main_Page|excellent ''​perf''​ documentation,​ but rather refers to it]] and [[https://​perf.wiki.kernel.org/​index.php/​Tutorial|particularly the ''​perf''​ tutorial]].
 +
 +==== OpenSpeedShop ====
 +
 +OpenSpeedShop ​ [[http://​www.openspeedshop.org/​wp/​]] (further information soon)
 +
 +==== Scalasca ====
 +
 +[[http://​www.scalasca.org/​|Scalasca]] is provided as modules((as it applies mostly to developers, it is currently confined to MOGON II)):
 +
 +''​perf/​Scalasca''​
 +
 +If you require assistance or updates, please contact us.
 +
 +
 +==== Intel'​s Advisor ====
 +<callout type="​warning"​ title="​WIP"​ icon="​true">​
 +MOGON-specific documentation for this tool has not been written, yet. Please do not hesitate to approach us for advice or recommendations. ​
 +</​callout>​
 +
 +==== Intel'​s Inspector ====
 +
 +<callout type="​warning"​ title="​WIP"​ icon="​true">​
 +MOGON-specific documentation for this tool has not been written, yet. Please do not hesitate to approach us for advice or recommendations. ​
 +</​callout>​
 +
 +==== Intel'​s VTune ====
 +
 +Intel VTune Amplifier is a powerful serial and parallel profiler which can be used to collect performance statistics of your code.  VTune can profile code written in C, C++, C#, FORTRAN, Java, and Assembly. ​ VTune is designed to be used on shared memory machines so code using MPI and/or OpenMP, as long as it is confined to run on a single node, can be profiled. ​
 +
 +<​accordion>​
 +<panel collapsed="​false"​ title="​Using Intel'​s Vtune Software on MOGON">​
 +=== Initial Setup ===
 +
 +  - Set up the VTune environment by loading the VTune module as follows: ''​module load tools/​VTune'' ​
 +  - Build your application as you normally would but also turn on the compiler debug symbols. ​ This is typically done by adding the ''​-g''​ option to the ''​icc'',​ ''​gcc'',​ ''​mpicc'',​ ''​ifort'',​ etc. command. ​ This enables source-level profiling. ​ It is recommended to use release build optimization flags (e.g. ''​-O3 -xAVX''​). ​ This way efforts can be spent optimizing regions not addressed by compiler optimizations.
 +
 +=== Serial Usage with the GUI ===
 +
 +
 +{{:​profiling:​vtune_screenshot.png?​direct&​400 |VTune Screenshot}}
 +
 +  - Do not use this approach for jobs running longer than a few minutes - instead submit to the scheduler and view the results in the gui (see section below). ​
 +  - After loading the VTune module start the gui from the command line:  "​amplxe-gui"​.
 +  - If this is the first time you have run VTune click "New Project"​
 +       * Give your project a name and choose a location to store the analysis output
 +       * Choose the application that you built in the initial setup stages (Ex. ~/​sample_code.exe).
 +       * Choose a working directory
 +  - Choose New Analysis; then "Basic Hotspots." ​ It is recommended to start with basic hotspots and then move to more advanced profiling analyses if necessary.
 +  - On the right hand side click the Start button.
 +       * Your application will no run in the background while VTune collects data.  The amount of time this takes is individual to your application;​ VTune should not add a noticeable amount of overhead.
 +       * To stop your application prematurely,​ click stop on the right hand side.  This stop the program and collection, but will still display the partial results.
 +  - VTune will then finalize the results and display a summary page.
 +       * Assuming the application was compiled with the -g flag, the Top Hotspots should point out the most time consuming functions/​subroutines of your program.
 +       * The CPU usage histogram is not applicable for serial codes - you can safely ignore this.
 +  - You can see more information by clicking on the bottom-up or top-down tree.
 +       * Double clicking on a line will bring you to the source code of the application and show CPU usage on a per source code line basis.
 +       * This will point you to the areas that should be the focus of your optimization efforts.
 +
 +=== Serial/​Parallel Usage Through Scheduler ===
 +
 +   - The instructions here detail how to submit your program to run with the VTune collector on a remote compute node and then finalize and visualize the results in the GUI on the head node.
 +   - Build your application as you normally would but also turn on the compiler debug symbols. ​ This is typically done by adding the ''​-g''​ option to the icc, gcc, mpicc, ifort, etc, command. ​ This enables source-level profiling. ​ It is recommended to use release build optimization flags (e.g. ''​-O3'',​ ''​-xAVX''​). ​ This way efforts can be spent optimizing regions not addressed by compiler optimizations.
 +   - Load your application as described above and load the module.
 +   - Start an [[:​slurm_submit#​allocation_with_salloc|interactive job and a whole node]]. Always reserve a whole node (or multiple nodes), when profiling or benchmarking.
 +   - Start the application as described above.
 +
 +=== Getting Help ===
 +
 +In case of specific questions regarding the use of VTune on mogon, please see us at the [[training_and_outreach:​workshop|HPC-Workshop (as announced on the wiki start page)]]
 +
 +
 +----
 +
 +
 +This page was written after the template of [[http://​www.princeton.edu/​researchcomputing/​faq/​profiling-with-intel-vtun/​|the Princeton Research Computing Facility page]].
 +
 +</​panel>​
 +</​accordion>​
  • start/development/analysis_and_optimization.txt
  • Last modified: 2020/04/16 19:30
  • by jrutte02