Use this tag to ask questions about Intel® VTune™ Profiler, which is an advanced performance profiler to find and optimize performance bottlenecks across CPU, GPU, and FPGA systems.
Questions tagged [vtune]
140 questions
65
votes
2 answers
Performance difference between Windows and Linux using Intel compiler: looking at the assembly
I am running a program on both Windows and Linux (x86-64). It has been compiled with the same compiler (Intel Parallel Studio XE 2017) with the same options, and the Windows version is 3 times faster than the Linux one. The culprit is a call to…
![](../../users/profiles/3763545.webp)
InsideLoop
- 5,563
- 2
- 22
- 47
15
votes
1 answer
Pthread Mutex: pthread_mutex_unlock() consumes lots of time
I wrote a multi-thread program with pthread, using the producer-consumer model.
When I use Intel VTune profiler to profile my program, I found the producer and consumer spend lots of time on pthread_mutex_unlock. I don't understand why this…
![](../../users/profiles/2927352.webp)
lei_z
- 899
- 9
- 26
15
votes
6 answers
Using C/Intel assembly, what is the fastest way to test if a 128-byte memory block contains all zeros?
Continuing on from my first question, I am trying to optimize a memory hotspot found via VTune profiling a 64-bit C program.
In particular, I'd like to find the fastest way to test if a 128-byte block of memory contains all zeros. You may assume any…
![](../../users/profiles/2101396.webp)
eyepopslikeamosquito
- 185
- 1
- 7
14
votes
4 answers
How to profile time spent in memory access in C/C++ applications?
Total Time spent by a function in an application can be broadly divided in to two components:
Time spent on actual computation (Tcomp)
Time spent on memory accesses (Tmem)
Typically profilers provide an estimate of the total time spent by a…
![](../../users/profiles/3165073.webp)
Imran
- 583
- 3
- 21
9
votes
2 answers
What might cause the same SSE code to run a few times slower in the same function?
Edit 3: The images are links to the full-size versions. Sorry for the pictures-of-text, but the graphs would be hard to copy/paste into a text table.
I have the following VTune profile for a program compiled with icc --std=c++14 -qopenmp -axS -O3…
![](../../users/profiles/342384.webp)
iksemyonov
- 3,923
- 1
- 19
- 37
8
votes
4 answers
Profiling help required
I have a profiling issue - imagine I have the following code...
void main()
{
well_written_function();
badly_written_function();
}
void well_written_function()
{
for (a small number)
{
highly_optimised_subroutine();
…
![](../../users/profiles/169774.webp)
Mick
- 7,929
- 20
- 73
- 162
7
votes
3 answers
Is VTune Worth Considering for Delphi?
Running through all the questions on profiling tools, I was surprised to discover VTune by Intel that I hadn't heard of before. At $700, it is even more expensive than AQTime.
But before I make the decision to put down the big bucks for AQTime, has…
![](../../users/profiles/30176.webp)
lkessler
- 19,414
- 31
- 125
- 196
7
votes
1 answer
VTune Profiler giving Error: "The Data Cannot be displayed,there is no viewpoint available for data "
I want to optimize my code which is written in c++ on linux platform.For that i am using Intel VTune Performance Analyzer Profiler .When i am identifying Hotspots , it successfully runs the binary executable whose path i have specified and then it…
![](../../users/profiles/1837662.webp)
Jasdeep Singh Arora
- 513
- 2
- 9
- 29
6
votes
2 answers
Optimzing SSE-code
I'm currently developing a C-module for a Java-application that needs some performance improvements (see Improving performance of network coding-encoding for a background). I've tried to optimize the code using SSE-intrinsics and it executes…
![](../../users/profiles/71354.webp)
Yrlec
- 3,223
- 6
- 35
- 72
6
votes
2 answers
MKL Performance on Intel Phi
I have a routine that performs a few MKL calls on small matrices (50-100 x 1000 elements) to fit a model, which I then call for different models. In pseudo-code:
double doModelFit(int model, ...) {
...
while( !done ) {
cblas_dgemm(...);
…
![](../../users/profiles/2777916.webp)
Andrew
- 847
- 7
- 18
6
votes
1 answer
Hotspot in a for loop
I am trying to optimize this code.
static
lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 )
{
const size_t len1 = s1.size(), len2 = s2.size();
std::vector col( len2+1 ), prevCol( len2+1 );
…
![](../../users/profiles/748175.webp)
qdii
- 11,387
- 7
- 54
- 107
6
votes
3 answers
Vtune report Outside any known module
I am using Intel(R) VTune(TM) Amplifier XE 2013 Update 5 (build 274450) for my linux application hotspot collect, but the report says the "[Outside any known module]" consume most of the time, so i want to get more info about the unknow module.
when…
![](../../users/profiles/1784619.webp)
Caukie Relsis
- 61
- 3
4
votes
1 answer
When profiling, most of the time is spent in nvoglv64.dll. What should I deduce?
I am profiling a C++ application with Intel VTune Amplifier. Most of the time seems to be spent in nvoglv64.dll more precisely in DrvPresentBuffers and/or KeSynchoronizeExecution. Note that I have a NVIDA GeoForce graphic card.
I am new to the…
![](../../users/profiles/1011233.webp)
Palmira
- 111
- 1
- 10
4
votes
1 answer
What is _kmp_fork_barrier and how to see if there is load imbalance?
I'm using Intel VTune Amplifier to see how my parallel application scales.
Notice I don't use any explicit lock mechanism
It scales pretty well on my 4-cores laptop (considering that there are portions of the algorithm that can't be…
![](../../users/profiles/7924715.webp)
cplusplusuberalles
- 189
- 9
4
votes
0 answers
How to monitor the utilization of cores on Xeon Phi at 10Hz?
I've been trying to measure/monitor the utilization of all those 60 cores on Xeon Phi (Knights Corner, in-order processors) at a relatively high frequency, say, at least every 0.1s which yields to 10Hz.
I tried the latest PAPI library. But it only…
![](../../users/profiles/1826036.webp)
thierry
- 217
- 1
- 12