Questions tagged [cpu-architecture]

The hardware microarchitecture (x86, x86_64, ARM, ...) of a CPU or microcontroller.

The hardware architecture and ISA (x86, x86_64, ARM, ...) and the micro-architectural implementation of a CPU or microcontroller.

Use this tag for questions regarding features, bugs and details concerning the inner working of specific CPU architectures.

Don't use this question if you have no reason to believe your issue is related to the CPU architecture.

2598 questions
25552
votes
26 answers

Why is processing a sorted array faster than processing an unsorted array?

Here is a piece of C++ code that shows some very peculiar behavior. For some strange reason, sorting the data miraculously makes the code almost six times faster: #include #include #include int main() { //…
GManNickG
  • 459,504
  • 50
  • 465
  • 534
667
votes
4 answers

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achieved on a modern x86-64 Intel CPU? As far as I understand it takes three cycles for an SSE add and five cycles for a mul to complete on most…
user1059432
  • 7,158
  • 3
  • 17
  • 16
332
votes
4 answers

Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs

I've been racking my brain for a week trying to complete this assignment and I'm hoping someone here can lead me toward the right path. Let me start with the instructor's instructions: Your assignment is the opposite of our first lab assignment,…
Cowmoogun
  • 2,367
  • 4
  • 10
  • 17
258
votes
3 answers

What is a retpoline and how does it work?

In order to mitigate against kernel or cross-process memory disclosure (the Spectre attack), the Linux kernel1 will be compiled with a new option, -mindirect-branch=thunk-extern introduced to gcc to perform indirect calls through a so-called…
BeeOnRope
  • 51,419
  • 13
  • 149
  • 309
220
votes
3 answers

What is the purpose of the "Prefer 32-bit" setting in Visual Studio and how does it actually work?

It is unclear to me how the compiler will automatically know to compile for 64-bit when it needs to. How does it know when it can confidently target 32-bit? I am mainly curious about how the compiler knows which architecture to target when…
Aaron
  • 9,226
  • 13
  • 35
  • 53
218
votes
7 answers

Difference between core and processor

What is the difference between a core and a processor? I've already looked for it on Google, but I'm just having multi-core and multi-processor definition, but it doesn't match what I am looking for.
Saad Achemlal
  • 2,737
  • 5
  • 13
  • 16
193
votes
3 answers

What Every Programmer Should Know About Memory?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…
Framester
  • 27,123
  • 44
  • 121
  • 183
183
votes
10 answers

What is the difference between Trap and Interrupt?

What is the difference between Trap and Interrupt? If the terminology is different for different systems, then what do they mean on x86?
David
  • 2,830
  • 6
  • 22
  • 29
158
votes
2 answers

What is difference between sjlj vs dwarf vs seh?

I can't find enough information to decide which compiler should I use to compile my project. There are several programs on different computers simulating a process. On Linux, I'm using GCC. Everything is great. I can optimize code, it compiles fast…
sorush-r
  • 9,347
  • 14
  • 77
  • 160
155
votes
12 answers

Why is a boolean 1 byte and not 1 bit of size?

In C++, Why is a boolean 1 byte and not 1 bit of size? Why aren't there types like a 4-bit or 2-bit integers? I'm missing out the above things when writing an emulator for a CPU
Asm
  • 1,561
  • 2
  • 10
  • 4
151
votes
1 answer

Why is processing an unsorted array the same speed as processing a sorted array with modern x86-64 clang?

I discovered this popular ~9-year-old SO question and decided to double-check its outcomes. So, I have AMD Ryzen 9 5950X, clang++ 10 and Linux, I copy-pasted code from the question and here is what I got: Sorted - 0.549702s: ~/d/so_sorting_faster$…
DimanNe
  • 1,418
  • 2
  • 10
  • 14
120
votes
16 answers

Are there any smart cases of runtime code modification?

Can you think of any legitimate (smart) uses for runtime code modification (program modifying it's own code at runtime)? Modern operating systems seem to frown upon programs that do this since this technique has been used by viruses to avoid…
114
votes
6 answers

What is the "FS"/"GS" register intended for?

So I know what the following registers and their uses are supposed to be: CS = Code Segment (used for IP) DS = Data Segment (used for MOV) ES = Destination Segment (used for MOVS, etc.) SS = Stack Segment (used for SP) But what are the following…
user541686
  • 189,354
  • 112
  • 476
  • 821
111
votes
10 answers

Why is x86 ugly? Why is it considered inferior when compared to others?

I've been reading some SO archives and encountered statements against the x86 architecture. Why do we need different CPU architecture for server & mini/mainframe & mixed-core? says "PC architecture is a mess, any OS developer would tell you…
claws
  • 47,010
  • 55
  • 140
  • 185
109
votes
5 answers

Write-back vs Write-Through caching?

My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "latter time". We still need to wait…
Naftaly
  • 11,192
  • 6
  • 28
  • 43
1
2 3
99 100