Quantcast
Channel: 中级
Viewing all articles
Browse latest Browse all 669

How to analyze OpenMP* applications using Intel® VTune™ Amplifie XE 2015

$
0
0

 

Introduction

 

Intel® VTune™ Amplifier XE 2015 now includes extensive capabilities for analyzing OpenMP applications. This article will step through this analysis on an Intel® Xeon Phi™ coprocessor.

 

Compiling and running the application

 

The application we will be using is one of the samples included in VTune Amplifier. It is located in /opt/intel/vtune_amplifier_xe_2015/samples/en/C++/matrix_vtune_amp_xe.tgz. To build the application on Linux*:

  1. Source the environment for the Intel® Compiler you are using for example.
    1. source /opt/intel/composer_xe_2015/compilervars.sh intel64
  2. Untar the sample in a directory where you have permission
    1. tar xvzf matrix_vtune_amp_xe.tgz
  3. By default the sample does not use OpenMP. You will need to modify the Makefile
    1. cd matrix/linux
    2. Edit the Makefile and comment the default PARAMODEL and uncomment the OpenMP PARAMODEL.
  4. Build the application to run native on the Xeon Phi
    1. cd matrix/linux
    2. make mic
  5. The make command from step #4 will create a Xeon Phi copressor native matrix.mic executable. It will also copy the file to mic0:/tmp.
  6. Verify the libiomp5.so library is available on your Xeon Phi coprocessor.
    1. Either in /tmp or lib64.
  7. Run the application
    1. /tmp/matrix.mic

Addr of buf1 = 0x7fec2b054010

Offs of buf1 = 0x7fec2b054180

Addr of buf2 = 0x7fec23fd3010

Offs of buf2 = 0x7fec23fd31c0

Addr of buf3 = 0x7fec1cf52010

Offs of buf3 = 0x7fec1cf52100

Addr of buf4 = 0x7fec15ed1010

Offs of buf4 = 0x7fec15ed1140

Threads #: 240 OpenMP threads

Matrix size: 3840

Using multiply kernel: multiply1

Freq = 1.090908 GHz

Execution time = 23.866 seconds

 

Running the application using VTune Amplifier

 

  1. Source /opt/intel/vtune_amplifier_xe_2015/amplxe-vars.sh
  2. Start the VTune Amplifier GUI
    1. amplxe-gui
  3. Create a VTune Amplifier project
    1. File->New->Project
    2. Specify a project name and also select Intel Xeon Phi coprocessor native
    3. Click Ok
  4. Click on "New Analysis"
  5. Click on "Advanced Hotspots"
  6. Click "Start"
  7. VTune Amplifier will launch the application and then finalize the result

Analyzing an OpenMP application using VTune Amplifier

  1. After your analysis has completed VTune Amplifier will show a summary view of your collected run.
  2. Two new sections have been added to the summary for OpenMP
    1. OpenMP Analysis. Collection Time
      1. Parallel Region Time
        1. Note the serial time. You should investigate if this serial time could be done in parallel.
      2. Estimated Ideal Time
      3. Potential Gain
        1. Several factors can affect potential gains such as load balance, work scheduling and if your region has enough work.
    2. Top OpenMP Regions by Potential Gain
      1. This section lists the OpenMP regions, you should investigate each region in the order of potential gain.

Bottom-Up Analysis

Click on the Bottom-up tab and you can see a list of top OpenMP regions.

 

 

Summary

VTune Amplifier has made some significant improvements in the analysis of OpenMP applications. VTune Amplifier now lists the top OpenMP regions in your program and also estimates potential performance gains you can possibly achieve by optimizing them. VTune Amplifier also points specifically at how much time you are running serially versus in parallel, this is one of the most critical insights in gaining performance on a parallel application.

 

 

  • 开发人员
  • Linux*
  • C/C++
  • Fortran
  • 中级
  • URL
  • 提升性能
  • 主题专区: 

    IDZone

    Viewing all articles
    Browse latest Browse all 669

    Trending Articles



    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>