Introduction
Intel® VTune™ Amplifier XE 2015 now includes extensive capabilities for analyzing OpenMP applications. This article will step through this analysis on an Intel® Xeon Phi™ coprocessor.
Compiling and running the application
The application we will be using is one of the samples included in VTune Amplifier. It is located in /opt/intel/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz. To build the application on Linux*:
- Source the environment for the Intel® Compiler you are using for example.
- source /opt/intel/composer_xe_2015/compilervars.sh intel64
- Untar the sample in a directory where you have permission
- tar xvzf matrix_vtune_amp_xe.tgz
- By default the sample does not use OpenMP. You will need to modify the Makefile
- cd matrix/linux
- Edit the Makefile and comment the default PARAMODEL and uncomment the OpenMP PARAMODEL.
- Build the application to run native on the Xeon Phi
- cd matrix/linux
- make mic
- The make command from step #4 will create a Xeon Phi copressor native matrix.mic executable. It will also copy the file to mic0:/tmp.
- Verify the libiomp5.so library is available on your Xeon Phi coprocessor.
- Either in /tmp or lib64.
- Run the application
- /tmp/matrix.mic
Addr of buf1 = 0x7fec2b054010
Offs of buf1 = 0x7fec2b054180
Addr of buf2 = 0x7fec23fd3010
Offs of buf2 = 0x7fec23fd31c0
Addr of buf3 = 0x7fec1cf52010
Offs of buf3 = 0x7fec1cf52100
Addr of buf4 = 0x7fec15ed1010
Offs of buf4 = 0x7fec15ed1140
Threads #: 240 OpenMP threads
Matrix size: 3840
Using multiply kernel: multiply1
Freq = 1.090908 GHz
Execution time = 23.866 seconds
Running the application using VTune Amplifier
- Source /opt/intel/vtune_amplifier_xe_2015/amplxe-vars.sh
- Start the VTune Amplifier GUI
- amplxe-gui
- Create a VTune Amplifier project
- Click on "New Analysis"
- Click on "Advanced Hotspots"
- Click "Start"
- VTune Amplifier will launch the application and then finalize the result
Analyzing an OpenMP application using VTune Amplifier
- After your analysis has completed VTune Amplifier will show a summary view of your collected run.
- Two new sections have been added to the summary for OpenMP
- OpenMP Analysis. Collection Time
- Parallel Region Time
- Note the serial time. You should investigate if this serial time could be done in parallel.
- Estimated Ideal Time
- Potential Gain
- Several factors can affect potential gains such as load balance, work scheduling and if your region has enough work.
- Parallel Region Time
- Top OpenMP Regions by Potential Gain
- This section lists the OpenMP regions, you should investigate each region in the order of potential gain.
- OpenMP Analysis. Collection Time
Bottom-Up Analysis
Click on the Bottom-up tab and you can see a list of top OpenMP regions.
Summary
VTune Amplifier has made some significant improvements in the analysis of OpenMP applications. VTune Amplifier now lists the top OpenMP regions in your program and also estimates potential performance gains you can possibly achieve by optimizing them. VTune Amplifier also points specifically at how much time you are running serially versus in parallel, this is one of the most critical insights in gaining performance on a parallel application.