8

Possible Duplicate:
When is assembler faster than C?

Hello,

This is purely a theory question, so, given an "infinite" time to make a trivial program, and an advanced knowledge of C and Assembly, is it really better to do something in Assembly? is "performance" lost when compiling C into Assembly (to machine code)?

By performance I mean, do modern C compilers do a bad job at certain tasks that programming directly in Assembly speeds up?

Thank you.

Community
  • 1
  • 1
Manux
  • 3,323
  • 3
  • 26
  • 40
  • Essentially the same as [When is assembler faster than C?](http://stackoverflow.com/questions/577554/when-is-assembler-faster-than-c). – Matthew Flaschen Jul 22 '10 at 23:27

11 Answers11

17

Modern C can do a better job than assembly in many cases, because keeping track of which operations can overlap and which will block others is so complex it can only be reasonably be tracked by a computer.

Mark Ransom
  • 271,357
  • 39
  • 345
  • 578
  • 1
    @Mark: I don't disagree with the sentiment of this answer, but why can't an assembler make the same optimizations regarding instruction scheduling as a C compiler? – indiv Jul 22 '10 at 23:33
  • 1
    @indiv: They can, in theory. In practice, it can be insanely difficult, particularly for RISC and RISC-like CPU's (which, these days, is essentially all of them). – Steven Sudit Jul 22 '10 at 23:35
  • 5
    Yes, you can do anything that the C compiler can do, it's just that *you* have to do it. Have fun =) – Ed S. Jul 22 '10 at 23:39
  • @Steven Out of curiosity, do you have a link with statistics about how many RISC CPUs there are? It was my understanding that most desktops/laptops are, for the most part, some variant of x86 and thus CISC – baudtack Jul 22 '10 at 23:51
  • I couldn't find any info on the current state of assemblers (probably it's pretty bleak), but I did find a paper on Intel's IA-64 Assembly Assistant that would optimize instruction scheduling (page 8). The paper also discusses some limitations of optimizing assembly vs C if anyone is interested. http://download.intel.com/technology/itj/q41999/pdf/assemble.pdf. And @Ed, I was referring to optimizations that can be done by the assembler when translating assembly to machine code. – indiv Jul 23 '10 at 00:17
  • @docgnome: Even supposed CISC chips, like the x86, have adapted many RISC techniques. For example, it used to be faster to use complex instructions to move bytes en-masse (MOVSW, etc) but now it's faster to use RISC-like load/store techniques. – Steven Sudit Jul 23 '10 at 00:24
  • 3
    @Steven Sudit: That was true on the Pentium, but by the time of the Pentium Pro `rep movsw` was faster again. Those instructions are still being improved - eg see here: http://lkml.org/lkml/2009/11/6/66 – caf Jul 23 '10 at 00:37
  • @caf: Thank you for the interesting link. Looks like things have come around in at least that regard. I suspect my point about the difficulty of hand-scheduling ops to take full advantage of pipelining still stands, though. – Steven Sudit Jul 23 '10 at 00:46
  • @caf Do you know how much of this is because of modern x86 and x86-64 chips being actually implemented as mostly RISC architectures underneath the covers, with a lot of the complex instructions implemented in bytecode in terms of those basic instructions? – mtraceur Oct 01 '18 at 09:30
  • 1
    +1, and to add emphasis to this: A modern compiler generally knows what's most efficient for thousands of different CPUs, including ones implementing the same instruction set. That compiler also knows how to convert conventional, easier-for-humans-to-understand C idioms into each specific CPU's most efficient code, taking into account things like cache sizes, pipeline depths, etc. Compilers don't always get this perfectly or thoroughly, but they contain within them deep optimization knowledge and can tap computing power to do in seconds optimizations that human brains need hours to verify. – mtraceur Oct 01 '18 at 09:35
14

C is not inefficient compared to anything. C is a language, and we don't describe languages in terms of efficiency. We compare programs in terms of efficiency. C doesn't write programs; programmers write programs.

Assembly gives you immense flexibility when comparing with C, and that is at the cost of time programming. If you are a guru C programmer and a guru Assembly programmer, then chances are you might be able to squeeze some more juice with Assembly for writing any given program, but the price for that is virtually certain to be prohibitive.

Most of us aren't gurus in either of these languages. For most of us, giving the responsibility of performance tuning to a C compiler is a double win: you get the wisdom of a number of Assembly gurus, the people who wrote the C compiler, along with an immense amount of time in your hands to further correct and enhance your C program. You also get portability as a bonus.

wilhelmtell
  • 53,297
  • 19
  • 89
  • 128
  • 1
    +1. I think it's also worth adding that one does not simply become a guru with *assembly* by itself. One becomes a guru with what assembly performs best for a program *on a given CPU model*, which varies not just by workload but also by "transparent" CPU details like cache size and cache line size, branch predictor performance, pipeline depth, and all sorts of other details that change "invisibly" below the instruction set itself. Compilers bring that per-CPU knowledge to all so long as one contributor takes the time to add support for it. A human guru has to learn it personally for each one. – mtraceur Oct 01 '18 at 09:39
9

This question seems to stem from the misconception that higher performance is automatically better. There is too much to be gained from a higher level perspective to make assembly better in the general case. Even if performance is your primary concern, compilers usually do a better job creating efficient assembly than you could write yourself. They have a much broader "understanding" of all of your source code than you could possibly hold in your head. Many optimizations can be had from NOT using well-structured assembly.

Obviously there are exceptions. If you need to access hardware directly, including special processing features of CPUs (e.g. SSE), then assembly is the way to go. However, in that case, you're probably better off using a library that addresses your general problem more directly (e.g. numerics packages).

But you should only worry about things like this if you have a concrete, specific need for the increased performance and you can show that your assembly actually IS faster. Concrete specific needs include: noticed and measure performance problems, embedded systems where performance is a fundamental design concern, etc.

Cogwheel
  • 20,820
  • 4
  • 42
  • 67
  • I agree with your point. On a lighter note - **"all of your source code than you could possibly hold in your head" ** And then they say this about human memory http://www.effective-mind-control.com/human-memory-capacity.html – Praveen S Jul 23 '10 at 09:13
  • I see your point, however I meant that statement from a slightly different angle. It's not so much a matter of memory as it is an intuitive awareness of all the different interactions among the various systems. Think L1 cache instead of Flash memory. :) – Cogwheel Jul 23 '10 at 15:04
5

Unless you are an assembly expert and(/or) taking advantage of advanced opcodes not utilized by the compiler, the C compiler will likely win.

Try it for fun ;-)

More realistic solutions are often to let the C compiler do it's bit, then profile and, if needed, tweak specific sections -- many compilers can dump some sort of low-level IL (or even "assembly").

  • 3
    You can compile the C and look at the Assembly Language output in the debugger. This lets you tweak the C and repeat the process until you've gotten the compiler to generate the code you want. – Steven Sudit Jul 22 '10 at 23:22
  • You can also generate assembly from arbitrary object code with objdump. Compiler support is not necessary. – nmichaels Jul 23 '10 at 10:31
5

Use C for most tasks, and write inline assembly code for specific ones (for example, to take advantage of SSE, MME, ...)

yassin
  • 5,881
  • 7
  • 31
  • 38
  • 1
    Agreed. A friend and I were diddling around with square rooty things the other day, and they were able to write some assembly to take advantage of the XMM intrinsics: it blew the compiled code out of the water. – Paul Nathan Jul 22 '10 at 23:37
  • This was a theoretical question asked, not a practical one. – Zorf Jul 23 '10 at 00:57
3

Ignoring how much time it would take to write the code, and assuming you have all the knowledge that is required to do any task most efficiently in both situations, assembly code will, by definition, always be able to either meet or outperform the code generated by a C compiler, because the C compiler has to create the assembly code to do the same task and it cannot optimize everything; and anything the C compiler writes, you could also write (in theory), and unlike the compiler, you can sometimes take a shortcut because you know more about the situation than can be expressed in C code.

However, that doesn't mean they do a bad job and that the code is too slow; just that it's slower than it could be. It may not be by more than a few microseconds, but it can still be slower.

What you have to remember is that some optimizations performed by a compiler are very complex: agressive optimization tends to lead to very unreadable assembly code, and it becomes harder to reason about the code as a result if you were to do them manually. That's why you'd normally write it in C (or some other language) first, then profile it to find problem areas, and then go on to hand-optimize that piece of code until it reaches an acceptable speed - because the cost of writing everything in assembly is much higher, while often providing little or no benefit.

Michael Madsen
  • 51,727
  • 6
  • 69
  • 80
3

It depends. C compilers for Intel do a pretty good job nowadays. I wasn't so impressed by compilers for ARM - I could easly write an assembly version of an inner loop that performed twice as fast. You typically don't need assembly on x86 machines. If you want to gain direct access to SSE instructions, look into compiler intrinsics!

darklon
  • 438
  • 2
  • 13
2

Actually, C might be faster than assembly in many cases, since compilers apply optimizations to your code. Even so, the performance difference (if any) is negligible.

I would focus more on readability & maintainability of the code base, as well as whether what you are trying to do is supported in C. In many cases, assembly will allow you to do more low-level things that C simply cannot do. For example, with assembly you can take advantage of MMX or SSE instructions directly.

So in the end, focus on what you want to accomplish. Remember - assembly language code is terrible to maintain. Use it only when you have no other choice.

dacris
  • 316
  • 1
  • 7
2

No, compilers do not do a bad job at all. The amount of optimization that can be squeezed out by using assembly is insignificant for most programs.

That amount depends on how you define 'modern C compiler'. A brand new compiler (For a chip that has just reached market) may have a large number of inefficiencies that will get ironed out over time. Just compile some simple programs (the string.h functions, for example), and analyze what each line of code does. You may be surprised at some of the wasteful things an untested C compiler does, and recognize the error with a simple read-through of the code. A mature, well-tested, thoroughly optimized compiler (Think x86) will do a great job of generating assembly, though a new one will still do a decent job.

In no case can C do a better job than assembly. You could just benchmark the two, and if your assembly was slower, compile with -S and submit the resulting assembly, and you're guaranteed a tie. C is compiled to assembly, which has a 1:1 correlation with the bytecode. The computer can't do anything that assembly can't do, assuming that the complete instruction set is published.

In some cases, C is not expressive enough to be fully optimized. A programmer may know something about the nature of the data that simply cannot be expressed in C in such a way that the compiler can take advantage of this knowledge. Certainly, C is expressive and close to the metal, and is very good for optimization, but complete optimization is not always possible.

A compiler can't define 'performance' like a human can. I understand that you said trivial programs, but even in the simplest (useful) algorithms, there will be a tradeoff between size and speed. The compiler can't do this at a more fine grained scale than the -Os/-O[1-3] flags, but a human can know what 'best' means in the context of the purpose of a program.

Some architecture-dependent assembly instructions can't be expressed in C. This is where ASM() statements come in. Sometimes, these are not for optimization at all, but simply because there is no way to express in C that this line must use, say, the atomic test-and-set operation, or that we want to issue an SVC interrupt with the encoded parameter X.

The above points notwithstanding, C is orders of magnitude more efficient to program in and to master. If performance is important, analysis of the assembly will be necessary, and optimizations will probably be found, but the tradeoff in developer time and effort is rarely worth the effort for complex programs on a PC. For very simple programs which must be as fast as absolutely possible (like an RTOS), or which have severe memory constraints (like an ATTiny with 1KB of Flash (non-writable) memory and 64Bytes of RAM), assembly may be the only way to go.

Kevin Vermeer
  • 2,556
  • 1
  • 25
  • 37
  • Not all assembly has a 1:1 match with the bytecode - I have been working with a cpu that has a 'high level assembly' that the assembler takes. – Paul Nathan Jul 23 '10 at 15:42
  • @Paul I understand that you can use a high level assembly language if you want, on many processors including x86 - but I think that this is more properly a programming language in itself, not assembly as the question indicated. Is there a 'low level assembly' available for your processor? Even if there is no such assembler provided by the manufacturer, the output of your current assembler is just encoded low-level assembly. – Kevin Vermeer Jul 23 '10 at 17:03
  • It does translate into a low-level assembly - but that's not really supported for users. Most of the high-level stuff involves collapsing similar instructions into a single instruction with easier syntax. It doesn't have macro assembly type stuff. – Paul Nathan Jul 23 '10 at 17:29
1

Given an infinite time and an extremely deep understanding on how a modern CPU works you can actually write the "perfect" program (i.e. the best performance possible on that machine), but you will have to consider, for any instruction in your program, how CPU behaves in that context, pipelining and caching related optimizations, and many many other things. A compiler is built to generate the best assembly code possible. You will rarely understand a modern complier generated assembly code because it tends to be really extreme. At times compliers fail in this task because they can't always foresee what's happening. Generally they do a great job but they sometimes fail...

Resuming... knowing C and Assembly is absolutely not enough to do a better job than a compiler in 99.99% cases, and considered that programming something in C can be 10000 times faster than programming the same assembly program a nicer way to spend some time is optimizing what the compiler did wrong in the remaining 0.01%, not reinventing the wheel.

Critic47
  • 46
  • 3
0

This depends on the compiler you use? This is no property of C or any language. Theoretically it's possible to load a compiler with such a sophisticated AI that you can compile prolog to more efficient machine language than GCC can do with C.

This depends 100% on the compiler and 0% on C.

What does matter is that C is written as a language for which it is easy to write an optimizing compiler from C -> assembly, and with assembly this means the instructions of a Von Neumann machine. It depends on the target, some languages like prolog will probably be easier to map on hypothetical 'reduction machines'.

But, given that assembly is your target language for your C compiler (you can technically compile C to brainfuck or to Haskell, there is no theoretical difference) then:

  • It is possible to write the optimally fast program in that assembly itself (duh)
  • It is possible to write a C compiler which in every instant shall produce the most optimal assembly. That is to say, there exists a function from every C program to the most optimal way to get the same I/O in assembly, and this function is computable, albeit perhaps not deterministically.
  • This is also possible with every other programming language in the world.
Zorf
  • 5,650
  • 1
  • 25
  • 24