Questions tagged [vtune]

Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.

140 questions
65
votes
2 answers

Performance difference between Windows and Linux using Intel compiler: looking at the assembly

I am running a program on both Windows and Linux (x86-64). It has been compiled with the same compiler (Intel Parallel Studio XE 2017) with the same options, and the Windows version is 3 times faster than the Linux one. The culprit is a call to…
InsideLoop
  • 5,563
  • 2
  • 22
  • 47
15
votes
1 answer

Pthread Mutex: pthread_mutex_unlock() consumes lots of time

I wrote a multi-thread program with pthread, using the producer-consumer model. When I use Intel VTune profiler to profile my program, I found the producer and consumer spend lots of time on pthread_mutex_unlock. I don't understand why this…
lei_z
  • 899
  • 9
  • 26
15
votes
6 answers

Using C/Intel assembly, what is the fastest way to test if a 128-byte memory block contains all zeros?

Continuing on from my first question, I am trying to optimize a memory hotspot found via VTune profiling a 64-bit C program. In particular, I'd like to find the fastest way to test if a 128-byte block of memory contains all zeros. You may assume any…
14
votes
4 answers

How to profile time spent in memory access in C/C++ applications?

Total Time spent by a function in an application can be broadly divided in to two components: Time spent on actual computation (Tcomp) Time spent on memory accesses (Tmem) Typically profilers provide an estimate of the total time spent by a…
Imran
  • 583
  • 3
  • 21
9
votes
2 answers

What might cause the same SSE code to run a few times slower in the same function?

Edit 3: The images are links to the full-size versions. Sorry for the pictures-of-text, but the graphs would be hard to copy/paste into a text table. I have the following VTune profile for a program compiled with icc --std=c++14 -qopenmp -axS -O3…
iksemyonov
  • 3,923
  • 1
  • 19
  • 37
8
votes
4 answers

Profiling help required

I have a profiling issue - imagine I have the following code... void main() { well_written_function(); badly_written_function(); } void well_written_function() { for (a small number) { highly_optimised_subroutine(); …
Mick
  • 7,929
  • 20
  • 73
  • 162
7
votes
3 answers

Is VTune Worth Considering for Delphi?

Running through all the questions on profiling tools, I was surprised to discover VTune by Intel that I hadn't heard of before. At $700, it is even more expensive than AQTime. But before I make the decision to put down the big bucks for AQTime, has…
lkessler
  • 19,414
  • 31
  • 125
  • 196
7
votes
1 answer

VTune Profiler giving Error: "The Data Cannot be displayed,there is no viewpoint available for data "

I want to optimize my code which is written in c++ on linux platform.For that i am using Intel VTune Performance Analyzer Profiler .When i am identifying Hotspots , it successfully runs the binary executable whose path i have specified and then it…
Jasdeep Singh Arora
  • 513
  • 2
  • 9
  • 29
6
votes
2 answers

Optimzing SSE-code

I'm currently developing a C-module for a Java-application that needs some performance improvements (see Improving performance of network coding-encoding for a background). I've tried to optimize the code using SSE-intrinsics and it executes…
Yrlec
  • 3,223
  • 6
  • 35
  • 72
6
votes
2 answers

MKL Performance on Intel Phi

I have a routine that performs a few MKL calls on small matrices (50-100 x 1000 elements) to fit a model, which I then call for different models. In pseudo-code: double doModelFit(int model, ...) { ... while( !done ) { cblas_dgemm(...); …
Andrew
  • 847
  • 7
  • 18
6
votes
1 answer

Hotspot in a for loop

I am trying to optimize this code. static lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 ) { const size_t len1 = s1.size(), len2 = s2.size(); std::vector col( len2+1 ), prevCol( len2+1 ); …
qdii
  • 11,387
  • 7
  • 54
  • 107
6
votes
3 answers

Vtune report Outside any known module

I am using Intel(R) VTune(TM) Amplifier XE 2013 Update 5 (build 274450) for my linux application hotspot collect, but the report says the "[Outside any known module]" consume most of the time, so i want to get more info about the unknow module. when…
4
votes
1 answer

When profiling, most of the time is spent in nvoglv64.dll. What should I deduce?

I am profiling a C++ application with Intel VTune Amplifier. Most of the time seems to be spent in nvoglv64.dll more precisely in DrvPresentBuffers and/or KeSynchoronizeExecution. Note that I have a NVIDA GeoForce graphic card. I am new to the…
Palmira
  • 111
  • 1
  • 10
4
votes
1 answer

What is _kmp_fork_barrier and how to see if there is load imbalance?

I'm using Intel VTune Amplifier to see how my parallel application scales. Notice I don't use any explicit lock mechanism It scales pretty well on my 4-cores laptop (considering that there are portions of the algorithm that can't be…
4
votes
0 answers

How to monitor the utilization of cores on Xeon Phi at 10Hz?

I've been trying to measure/monitor the utilization of all those 60 cores on Xeon Phi (Knights Corner, in-order processors) at a relatively high frequency, say, at least every 0.1s which yields to 10Hz. I tried the latest PAPI library. But it only…
thierry
  • 217
  • 1
  • 12
1
2 3
9 10