229

Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile? What are the key differences between the two?

Acumenus
  • 41,481
  • 14
  • 116
  • 107
user1922878
  • 2,477
  • 3
  • 11
  • 7
  • 39
    Unless the x86 has a ps/2 port I don't know about, it's no more built for keyboards than a pair of dirty underwear :-) – paxdiablo Feb 10 '13 at 03:43
  • 7
    I think **keyboard** is referring to a typical PC role as opposed to the physical device. – artless noise Feb 11 '13 at 14:28
  • 26
    The x86 was not designed; It evolved on an island, with a strange bird that ate everthing that tried to pray on it. It now looks stranger than a duck billed platypus, and would not do well if a ship-full of new animals came along. – ctrl-alt-delor Dec 13 '14 at 20:00
  • 6
    @richard - sadly, this happens to be the most historically accurate description of x86 i've ever seen. It says quite a lot about the industry. – Leeor Jan 04 '15 at 18:48
  • 6
    @Leeor Sorry I made a small mistake in my comment, I said that the bird ate predators of the x86, where as it did not eat them, it sat on them. It is also worthy of note that the soft feathers of the bird where so very very very tidy. – ctrl-alt-delor Jan 04 '15 at 20:14
  • 1
    @richard Can you explain your analogy please? I'm not too well versed on CPUs. I'm guessing your first sentence is saying something along the lines of squashing competition. And the second line is about how the CPU evolved or progressed roughly, as in just bunch of unpolished updates. Is my understanding correct? – Abdul Aug 18 '15 at 18:06
  • 1
    How can this question be so upvoted? It is certainly does not show any research effort. I think I should ask "What is the difference between C and Python?". This is will be great question for sure. – LennyB Aug 22 '15 at 06:55
  • Consider looking at some other like MIPS (used in silicone graphics work stations, and nintendo 64), PowerPC (used in mid era MACs, playstation, 68k (early MACs, Amiga), SPARC (sun), Alpha (Dec), et al. These where all superior to the x86, but lost the marketing battle, though some ended up as GPUs, doing the hard stuff for the x86 (GPUs are fast because of more parallelism (more cores etc), special purpose hardware, and not being an x86. – ctrl-alt-delor Aug 22 '15 at 22:47
  • Voting to close as too broad. – Ciro Santilli新疆棉花TRUMP BAN BAD Sep 20 '15 at 15:42
  • 3
    The difference between x86 and arm is not cisc vs risc. The x86 is not a good example of cisc, there are may examples of cisc, that have a lot in common with the arm. E.g 680x0: both have uniform instruction sets, both have a flat 32 bit address space, both have multiple general purpose registers (almost 680x0, has to types of register 8 data, 8 address), both have a single instruction set (actually arm has at least 3, but not for backwards compatibility reasons). – ctrl-alt-delor Jul 28 '16 at 16:46
  • This [link](http://www.androidauthority.com/arm-vs-x86-key-differences-explained-568718/) has all the information for ARM and x86. It also contains details about 64 bit implementation of both architectures. – Harish Gyanani Nov 03 '16 at 13:40

5 Answers5

368

ARM is a RISC (Reduced Instruction Set Computing) architecture while x86 is a CISC (Complex Instruction Set Computing) one.

The core difference between those in this aspect is that ARM instructions operate only on registers with a few instructions for loading and saving data from / to memory while x86 can operate directly on memory as well. Up until v8 ARM was a native 32 bit architecture, favoring four byte operations over others.

So ARM is a simpler architecture, leading to small silicon area and lots of power save features while x86 becoming a power beast in terms of both power consumption and production.

About question on "Is the x86 Architecture specially designed to work with a keyboard while ARM expects to be mobile?". x86 isn't specially designed to work with a keyboard neither ARM for mobile. However again because of the core architectural choices actually x86 also has instructions to work directly with IO while ARM has not. However with specialized IO buses like USBs, need for such features are also disappearing.

If you need a document to quote, this is what Cortex-A Series Programmers Guide (4.0) tells about differences between RISC and CISC architectures:

An ARM processor is a Reduced Instruction Set Computer (RISC) processor.

Complex Instruction Set Computer (CISC) processors, like the x86, have a rich instruction set capable of doing complex things with a single instruction. Such processors often have significant amounts of internal logic that decode machine instructions to sequences of internal operations (microcode).

RISC architectures, in contrast, have a smaller number of more general purpose instructions, that might be executed with significantly fewer transistors, making the silicon cheaper and more power efficient. Like other RISC architectures, ARM cores have a large number of general-purpose registers and many instructions execute in a single cycle. It has simple addressing modes, where all load/store addresses can be determined from register contents and instruction fields.

ARM company also provides a paper titled Architectures, Processors, and Devices Development Article describing how those terms apply to their bussiness.

An example comparing instruction set architecture:

For example if you would need some sort of bytewise memory comparison block in your application (generated by compiler, skipping details), this is how it might look like on x86

repe cmpsb         /* repeat while equal compare string bytewise */

while on ARM shortest form might look like (without error checking etc.)

top:
ldrb r2, [r0, #1]! /* load a byte from address in r0 into r2, increment r0 after */
ldrb r3, [r1, #1]! /* load a byte from address in r1 into r3, increment r1 after */
subs r2, r3, r2    /* subtract r2 from r3 and put result into r2      */
beq  top           /* branch(/jump) if result is zero                 */

which should give you a hint on how RISC and CISC instruction sets differ in complexity.

auselen
  • 25,874
  • 4
  • 67
  • 105
  • 10
    ARMv8-A has a 64-bit architecture called AArch64. – remmy Nov 23 '13 at 20:58
  • 13
    Although the x86 has some very powerful instructions, the arm can still beat it in a fight (if both have same clock speed). This is partly because the arm has a good set of registers, where as the x86 spends 1/2 of its time moving data in and out of its limited set of registers (this is less true of x86-64, is it has more registers). And partly because the Arm's simplicity leaves room for a bigger cache, and has all instructions conditional (making cache misses fewer). And arm's move multiple instruction (the only non RISC instruction), allows it to move data quickly. – ctrl-alt-delor Jan 04 '15 at 20:24
  • 6
    I could write ARM code faster, Though bigger, by using more registers. If I look at this implementation the x86 takes 5+9×N clocks, the ARM takes 4×N clocks (both figures are for no cache misses). The x86 scores better for instruction bytes on this example: x86 = 2 bytes, arm = 16 bytes. ARM scores much better on this metric in more realistic tests, e.g on exiting loop r2 will have information on if strings are equal / which is bigger, so will condition codes. The arm can run other instructions before checking condition codes. Arm does not have to branch when checking condition codes. – ctrl-alt-delor Jan 04 '15 at 20:56
  • ARM isn't pure RISC, like say MIPS, due to features like predication and the thumb instruction set – StanOverflow Jul 23 '15 at 13:25
  • Yes Thumb is not a risc instruction set, but you will typically use one instruction set (thumb of risc). But what do you mean by prediction? – ctrl-alt-delor Sep 03 '15 at 19:09
  • The document quoted says "ARM cores have a large number of general-purpose registers and many instructions execute in a single cycle." I understand that an ARM instruction pipeline will handle Fetch, Decode, and Execute at the same time (While one instruction executes, the next decodes, and the next fetches) as outlined here infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0084f/… But other than that technicality, how would multiple instructions execute per cycle? Obviously with multiple cores, but is the implication here that a single core executes multiple instructions simultaneously? – MeatFlavourDev Dec 16 '15 at 14:08
  • 3
    @JeremyFelix It looks like this http://stackoverflow.com/questions/13106297/is-arm-cortex-a8-pipeline-13-stage-or-14-stage There are different pipes for different type of instructions, even there are duplicated ones. CPU divides instructions into micro instructions and those can run in parallel among pipeline. – auselen Dec 17 '15 at 12:31
  • 3
    You say “while x86 can operate on directly memory as well.” however for the x86 (pre x86-64), it have so few registers that there was no “as well”, you had to store everything in memory; about ½ of instructions in a program where just to move things about. Whereas in ARM very few instructions are needed to move data about. – ctrl-alt-delor Feb 17 '16 at 09:48
  • 2
    The special IO instructions were added when the x86 was an 8 bit processor; With only 64k = 2¹⁶, bytes of addressable memory. There needed to be a way to do IO without using up memory addresses. Now the x86-64 has 2⁴⁸ ≈ 1000 trillion bytes (this can be expanded in the future up to 2⁶⁴) of address space, there is no need for IO instructions. – ctrl-alt-delor Feb 17 '16 at 09:55
  • @richard thanks for the notes! – auselen Feb 17 '16 at 20:05
  • So if the ARM program needs more instructions does it need more RAM – Suici Doga May 16 '16 at 05:34
  • I think the subs in the arm code should be cmp, with no destination. – ctrl-alt-delor Sep 05 '16 at 15:48
  • @SuiciDoga yes in the snippet of code above the arm will use more ram to store the code. However you will not that the code is useless on its own. You need code before to load the registers. On the arm the optimiser can fold in previous code (the values that get put in registers were probably used recently, so are probably already in a register, all registers are equivalent (except PC/R15), so probably no need to do anything. And after to react to the result, on arm it can do other things first (as a lot of stuff will not overwrite the result, unless programmer/compiler wants it to). – ctrl-alt-delor Sep 05 '16 at 15:56
  • 1
    @richard It's a bit more complicated than you make it sound. The x86 instruction set has relatively few registers (more in x86-64), *but* the microprocessors tend to have *hundreds*, with register renaming used to optimise. Also, while ARM instructions *are* larger, it's a three argument design (vs two for x86), so sometimes x86 needs more instructions than ARM (counterintuitively, as ARM is RISC). (Oh, and "predication" refers to the condition-code-on-any-instruction feature, removed in AArch64; there's also the ability to shift for free on most data operations. Also makes for shorter code.) – alastair Nov 02 '16 at 09:35
  • @alistair I am not sure what you mean by “but the microprocessors tend to have hundreds”, but yes. Also what you say about counter intuitive: Risc vs cisc in theory has nothing to do with x86 (x86 is nowhere near to a well designed cisc. 68000 is cisc, and has lots of registers, a near uniform instruction set, some 3 argument instructions e.g. a1 ← a2 + d1 (lea) ) – ctrl-alt-delor Nov 05 '16 at 13:04
  • God dude, that's crazy! – AO_ Jul 07 '17 at 10:04
  • Isn't x86 RISC since P6 (Pentium Pro), with a inner translator from CISC to RISC? – JCKödel Apr 03 '21 at 16:27
109

Neither has anything specific to keyboard or mobile, other than the fact that for years ARM has had a pretty substantial advantage in terms of power consumption, which made it attractive for all sorts of battery operated devices.

As far as the actual differences: ARM has more registers, supported predication for most instructions long before Intel added it, and has long incorporated all sorts of techniques (call them "tricks", if you prefer) to save power almost everywhere it could.

There's also a considerable difference in how the two encode instructions. Intel uses a fairly complex variable-length encoding in which an instruction can occupy anywhere from 1 up to 15 byte. This allows programs to be quite small, but makes instruction decoding relatively difficult (as in: decoding instructions fast in parallel is more like a complete nightmare).

ARM has two different instruction encoding modes: ARM and THUMB. In ARM mode, you get access to all instructions, and the encoding is extremely simple and fast to decode. Unfortunately, ARM mode code tends to be fairly large, so it's fairly common for a program to occupy around twice as much memory as Intel code would. Thumb mode attempts to mitigate that. It still uses quite a regular instruction encoding, but reduces most instructions from 32 bits to 16 bits, such as by reducing the number of registers, eliminating predication from most instructions, and reducing the range of branches. At least in my experience, this still doesn't usually give quite as dense of coding as x86 code can get, but it's fairly close, and decoding is still fairly simple and straightforward. Lower code density means you generally need at least a little more memory and (generally more seriously) a larger cache to get equivalent performance.

At one time Intel put a lot more emphasis on speed than power consumption. They started emphasizing power consumption primarily on the context of laptops. For laptops their typical power goal was on the order of 6 watts for a fairly small laptop. More recently (much more recently) they've started to target mobile devices (phones, tablets, etc.) For this market, they're looking at a couple of watts or so at most. They seem to be doing pretty well at that, though their approach has been substantially different from ARM's, emphasizing fabrication technology where ARM has mostly emphasized micro-architecture (not surprising, considering that ARM sells designs, and leaves fabrication to others).

Depending on the situation, a CPU's energy consumption is often more important than its power consumption though. At least as I'm using the terms, power consumption refers to power usage on a (more or less) instantaneous basis. Energy consumption, however, normalizes for speed, so if (for example) CPU A consumes 1 watt for 2 seconds to do a job, and CPU B consumes 2 watts for 1 second to do the same job, both CPUs consume the same total amount of energy (two watt seconds) to do that job--but with CPU B, you get results twice as fast.

ARM processors tend to do very well in terms of power consumption. So if you need something that needs a processor's "presence" almost constantly, but isn't really doing much work, they can work out pretty well. For example, if you're doing video conferencing, you gather a few milliseconds of data, compress it, send it, receive data from others, decompress it, play it back, and repeat. Even a really fast processor can't spend much time sleeping, so for tasks like this, ARM does really well.

Intel's processors (especially their Atom processors, which are actually intended for low power applications) are extremely competitive in terms of energy consumption. While they're running close to their full speed, they will consume more power than most ARM processors--but they also finish work quickly, so they can go back to sleep sooner. As a result, they can combine good battery life with good performance.

So, when comparing the two, you have to be careful about what you measure, to be sure that it reflects what you honestly care about. ARM does very well at power consumption, but depending on the situation you may easily care more about energy consumption than instantaneous power consumption.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • 2
    that is why ? RISC needs more RAM, whereas CISC has an emphasis on smaller code size and uses less RAM overall than RISC – Waqar Naeem Sep 21 '19 at 06:19
  • 1
    Thumb mode (variable length allowing short encodings) isn't a *difference*; that's how x86 always works (but moreso, with instruction length varying from 1 to 15 bytes, and much harder to decode than Thumb2). ARM mode (fixed width encoding with 3-operand non-destructive instructions) is the difference from x86! – Peter Cordes Jun 23 '20 at 06:33
  • 1
    *Having a lot faster processor isn't a big help* - video conferencing might be a better example: low latency means you can't just do a burst of decoding into a decent sized buffer and and go back into a deep or medium level sleep state. "Race to sleep" is a key concept in energy consumption for a fixed amount of computation, given that modern CPUs can save significant power when fully idle (clock stopped, or even powering down parts of the core. Or in deeper sleeps, also caches after write-back.) ... and that's the point you make in the next paragraph, of course. >. – Peter Cordes Jun 23 '20 at 06:38
  • @PeterCordes: Thumb Mode encoding isn't much like x86 encoding. Although it's not *quite* as regular as ARM encoding, it's still pretty much fixed format.Density increase is largely from eliminating bits that are simply rarely used in ARM encoding. For example, virtually all ARM instructions are conditional, but conditions are only used a fairly small percentage of the time (so most non-branch THUMB instructions are unconditional). – Jerry Coffin Jun 23 '20 at 06:42
  • @PeterCordes: You're right: video conferencing is a better example--I've edited that in. Thank you. – Jerry Coffin Jun 23 '20 at 06:46
  • Yes, of course there are differences if you look at the details. My point was that you bring it up in a paragraph that's mostly advantages for ARM (both as an ISA and the existing microarchitectures). That makes it sound like "x86 doesn't have a thumb mode" is a point in favour of ARM. But in fact the x86 machine code format is always a size-optimized variable-length byte-stream (inherited from 8086 where code density was a key consideration). – Peter Cordes Jun 23 '20 at 06:55
  • Of course common x86-64 / SIMD insns could be shorter if redone from scratch now, dropping some rarely-used 1-byte opcodes so it's somewhat fair to say that x86-64 machine code isn't as compact as it could be. The existence of Thumb shows that ARM has been tuned for low-end / embedded, but you could argue that the existence of ARM mode shows that ARM has a mode specifically for being easy to decode. (Although yes, of course Thumb2 is much easier to decode than modern x86-64; Thumb2 having been designed recently enough that parallel decode was a consideration, and with SIMD already existing) – Peter Cordes Jun 23 '20 at 06:57
  • Although if you were going to break backwards compat with x86-64 machine code, making it x86-like at all is not particularly necessary or useful. e.g. Agner Fog's blue-sky [ForwardCom paper architecture](https://www.forwardcom.info/) is an attempt to combine variable-length encodings (fusion of RISC and CISC ideas, like ARM) for good density with SIMD features somewhat like ARM SVE to let you make machine code that will transparently take advantage of wider SIMD vectors on future hardware. Or on a less ambitious front, drop a lot of x86 partial-flags crap like rotate and shift flag semantics – Peter Cordes Jun 23 '20 at 07:02
  • @PeterCordes: Ah, sorry--I misunderstood what you were trying to get at. I thought you were pushing toward the idea that Thumb was variable length, so instruction encoding was similar between the two (which, to be honest, surprised me enough that I probably should have realized it had to be wrong, and you were getting at something else). But I've added more detail about the differences in instruction encoding. – Jerry Coffin Jun 23 '20 at 08:28
  • @PeterCordes: And by the way: I want to register a serious protest against even suggesting the notion of dropping the auxiliary carry flag. I'm quite certain I used it once no more than 2 decades ago, at very most...well, okay, maybe 3 decades ago now that I think about it, but I do clearly remember having used in at least once, anyway. :-) Seriously, if I were going to break backward compatibility, I'd probably do something more like the Alpha, and eliminate the flags register completely. – Jerry Coffin Jun 23 '20 at 08:32
  • @JerryCoffin: Oh, well I was pointing out that [Thumb 2](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344c/Beiiegaf.html) is variable length ([ARMv7T2](//stackoverflow.com/q/28669905)), with a mix of 32-bit and 16-bit instructions. (Much easier to decode than x86; one bit in a fixed position signals 16 vs. 32 IIRC, but still a step in the x86 direction away from pure RISC (which ARM never was). *That* was my real point.) Original Thumb (ARMv4) was fixed-width 16-bit (with `bl` actually being a 16-bit setup / jump pair, so in practice still mixed 16 / 32-bit). – Peter Cordes Jun 23 '20 at 09:01
  • Disagree with dropping FLAGS, unless you provide some different mechanism for efficient extended-precision add / mul (e.g. for RSA). Implementing a full-adder (carry in and out) without `adc` sucks if you want to use the full register width. (Using 2^30 chunks in 32-bit registers does allow carry in/out without two branches or conditionals, though.) – Peter Cordes Jun 23 '20 at 09:03
44

Additional to Jerry Coffin's first paragraph. Ie, ARM design gives lower power consumption.

The company ARM, only licenses the CPU technology. They don't make physical chips. This allows other companies to add various peripheral technologies, typically called SOC or system-on-chip. Whether the device is a tablet, a cell phone, or an in-car entertainment system. This allows chip vendors to tailor the rest of the chip to a particular application. This has additional benefits,

  1. Lower board cost
  2. Lower power (note1)
  3. Easier manufacture
  4. Smaller form factor

ARM supports SOC vendors with AMBA, allowing SOC implementers to purchase off the shelf 3rd party modules; like an Ethernet, memory and interrupt controllers. Some other CPU platforms support this, like MIPS, but MIPS is not as power conscious.

All of these are beneficial to a handheld/battery operated design. Some are just good all around. As well, ARM has a history of battery operated devices; Apple Newton, Psion Organizers. The PDA software infra-structure was leveraged by some companies to create smart phone type devices. Although, more success was had by those who re-invented the GUI for use with a smart phone.

The rise of Open source tool sets and operating systems also facilitated the various SOC chips. A closed organization would have issues trying to support all the various devices available for the ARM. The two most popular cellular platforms, Andriod and OSx/IOS, are based up Linux and FreeBSD, Mach and NetBSD os's. Open Source helps SOC vendors provide software support for their chip sets.

Hopefully, why x86 is used for the keyboard is self-evident. It has the software, and more importantly people trained to use that software. Netwinder is one ARM system that was originally designed for the keyboard. Also, manufacturer's are currently looking at ARM64 for the server market. Power/heat is a concern at 24/7 data centers.

So I would say that the ecosystem that grows around these chips is as important as features like low power consumption. ARM has been striving for low power, higher performance computing for some time (mid to late 1980's) and they have a lot of people on board.

Note1: Multiple chips need bus drivers to inter-communicate at known voltages and drive. Also, typically separate chips need support capacitors and other power components which can be shared in an SOC system.

Community
  • 1
  • 1
artless noise
  • 18,969
  • 5
  • 57
  • 95
29

The ARM is like an Italian sports car:

  • Well balanced, well tuned, engine. Gives good acceleration, and top speed.
  • Excellent chases, brakes and suspension. Can stop quickly, can corner without slowing down.

The x86 is like an American muscle car:

  • Big engine, big fuel pump. Gives excellent top speed, and acceleration, but uses a lot of fuel.
  • Dreadful brakes, you need to put an appointment in your diary, if you want to slowdown.
  • Terrible steering, you have to slow down to corner.

In summary: the x86 is based on a design from 1974 and is good in a straight line (but uses a lot of fuel). The arm uses little fuel, does not slowdown for corners (branches).


Metaphor over, here are some real differences.

  • Arm has more registers.
  • Arm has few special purpose registers, x86 is all special purpose registers (so less moving stuff around).
  • Arm has few memory access commands, only load/store register.
  • Arm is internally Harvard architecture my design.
  • Arm is simple and fast.
  • Arm instructions are architecturally single cycle (except load/store multiple).
  • Arm instructions often do more than one thing (in a single cycle).
  • Where more that one Arm instruction is needed, such as the x86's looping store & auto-increment, the Arm still does it in less clock cycles.
  • Arm has more conditional instructions.
  • Arm's branch predictor is trivially simple (if unconditional or backwards then assume branch, else assume not-branch), and performs better that the very very very complex one in the x86 (there is not enough space here to explain it, not that I could).
  • Arm has a simple consistent instruction set (you could compile by hand, and learn the instruction set quickly).
ctrl-alt-delor
  • 6,726
  • 5
  • 30
  • 48
  • OK, frequency means speed, pipeline length means steering, fuel consumption - obvious, but what is acceleration and braking? And how about nitro?) – LennyB Aug 22 '15 at 07:05
  • @LennyB No: Clock speed is fuel consumption ( fuel consumption is also fuel consumption). Steering relates also to branch prediction, and conditional instructions. The arm has a very simple branch predictor: predict not-branching if conditionally branching forward else predict branch). With this simple predictor, it still outperforms the x86's branch predictor. This is manly because of the conditional instructions: it can run the if and else parts, but with one half having no effect, therefore no need to branch. – ctrl-alt-delor Aug 22 '15 at 22:40
  • 10
    This analogy breaks at the fact that Italian sports cars break down at every instant they can get while ARM CPUs don’t, and that while it could be easily done, you can’t actually *buy* a single ARM CPU that can do desktop CPU speeds, let alone socketed ones and mainboards to put them in. :) – Evi1M4chine Jan 13 '16 at 21:48
  • 2
    Performance wise it competes directly with some of the biggest / faster Xeon processors (eg E5-2690 v3) but at lower power, cost. https://www.quora.com/How-do-high-end-ARM-processors-compare-to-Intel-processors-in-terms-of-performance – ctrl-alt-delor Jan 14 '16 at 21:39
  • 2
    For massively parallel workloads like databases and I/O servers, sure. For single-threaded performance, nobody's designed an ARM core anywhere near as big as x86. No reason they couldn't, just nobody has. The "x86 tax" on power and die area is not that large compared to the amount of silicon used for the out-of-order machinery in high-power CPU cores. There are certainly warts in x86, but RISC has a code-density disadvantage (which doesn't usually matter much, but it still matters). This gets argued repeatedly on http://realworldtech.com/ forums. – Peter Cordes Feb 24 '16 at 05:19
  • [Agner Fog has an interesting proposal for an ISA that attempts to take the best of both worlds](http://www.agner.org/optimize/blog/read.php?i=421). He's suggesting variable-length instructions, but only in chunks of 32b, and with a simple and fast way to determine instruction length. This gives you code-density advantages of complex addressing modes and immediate constants in variable-length instructions, without being horrible to decode. He also has some interesting ideas for extensible SIMD, since that's something else x86 has done badly. – Peter Cordes Feb 24 '16 at 05:23
  • @PeterCordes if you have lots of registers then you do not need immediate addressing mode, as you can use register offset. – ctrl-alt-delor Feb 26 '16 at 18:28
  • 2
    @richard: There's a lot of stuff you don't "need", but that increases code density. The trick is balancing decode complexity against code size / number of instructions. Increasing the width of an out-of-order core is extremely expensive in power consumption, so packing more work into each instruction is valuable. A small increase in decode complexity is much cheaper. Modern x86 CPUs already manage to decode x86 quickly. (Not quite quickly enough to keep a 4-wide OOO core fed from the decoders instead of uop-cache or loop buffer, and of course at a high power cost.) – Peter Cordes Feb 26 '16 at 21:53
  • New Skylake CPUs are almost as power efficient as ARM and used in tablets like the Surface Pro 4 (previous version has 4 hours of battery while the Skylake version has 8 hours) – Suici Doga Mar 21 '16 at 03:14
  • 1
    How close is almost? and is it on the same manufacturing technology? You compare an unspecified x86 with a skyland x86, but do not say how much better the ARM is, just that it still is. And using older fabrication plants as well. – ctrl-alt-delor Mar 22 '16 at 20:46
  • 4
    @Evi1M4chine, it also breaks at the fact that an Italian sports car is hugely expensive, while an America muscle car is relatively cheap. And the muscle car is what it is because it is simple, while something like a Ferrari is very very complicated. Quite the opposite of CISC vs. RISC – Lorenzo Dematté Jul 28 '16 at 15:29
  • 1
    @SuiciDoga: Intel is well-known to lie and cheat at these figures. Have you taken the consumption of any northbridge-/southbridge-type chips into account? Because with anything “Atom”, last time I checked, they just put all the power-hungry parts into the NB, so the CPU looks good on paper. and when looking at the mainboard, the only chip with a fan was actually the NB! My past experience with Intel had been that they behave like politicians. – Evi1M4chine Jul 29 '16 at 22:33
  • @Evi1M4chine Some new tablets like the Surface Pro 4 (Core m3 model) don't have fans – Suici Doga Jul 30 '16 at 01:25
  • Even older laptops don't have northbridge (or southbridge) fans (corner of my laptop gets warm and there is a chip there without heatsink). So why would new tablets need fans for those – Suici Doga Sep 05 '16 at 14:23
  • @SuiciDoga EviM4chine seems to be writting about how intel moved power hungry stuff to the bridge, there for this may have happened after the (even) old devices that you mention. And yes the new Core Ms are nice, I have one it (the whole laptop) runs nice and cold, yet it has good speed and virtualisation. This is because of good chip design, not good instruction set: low voltage ( P ∝ V² ), large-ish cache, … – ctrl-alt-delor Sep 05 '16 at 15:40
  • 2
    @PeterCordes CISC vs RISC, is not same as x86 vs Arm: 68000 is CISC, but has more in common with Arm than x86. – ctrl-alt-delor Feb 10 '17 at 17:06
  • Agreed. I haven't looked at 68k for a long time, and IDK if there's room to add SIMD opcodes without making it as nasty to decode as x86. As far as what I was arguing, 68k has a lot of the same code-density advantages as x86. (e.g. post-increment addressing modes to save instructions). It's a much less ugly than x86, so it's more like ARM in that respect, but that's not what this discussion is about. – Peter Cordes Feb 14 '17 at 19:15
  • Are you arguing that high-performance 68k can be implemented with short pipelines like ARM? (which is what allows the low branching penalty). That's kind of a bogus argument, since the long pipelines of x86 CPUs are partly because they're clocked higher than ARM CPUs. A 4GHz ARM with comparable throughput to a modern x86 would need a long pipeline, too. Perhaps not as high as Intel Skylake, but probably comparable to Skylake running from the decoded-uop cache instead of the legacy decoders (~5 stages shorter). – Peter Cordes Feb 14 '17 at 19:17
  • @Peter Cordes If the arm uses register addressing that is 32bits, if the x86 uses immediate addressing that is ≥ 40bits. Therefore arm wins on code density for this. It is not true that all CISC have higher code density than all RISC. E.g. x86 has high code density, due to pore design. As for 68k, I would imagine that is has better code density than x86, as it has lots of registers, x86 spends ½ of its instructions moving stuff between memory and registers. – ctrl-alt-delor Mar 10 '17 at 19:05
  • 1
    x86 was not "designed in the 60s". Read https://en.wikipedia.org/wiki/X86 some time. – Jeff Hammond Jan 06 '18 at 15:55
  • @Jeff sorry 1974 (or at least backward compatible with). – ctrl-alt-delor Jan 06 '18 at 16:20
  • 8086 was released in 1978, but in any case, unless you are running in 16-bit real mode, you are not using a 1970s design. You may also want to learn about the microarchitectural implementation, which leverages RISC ideas (uops). – Jeff Hammond Jan 06 '18 at 19:21
  • 1
    @Jeff: You're missing the point that 32-bit mode uses mostly the same machine-encoding and instruction semantics as 16-bit mode, just with a different default operand-size and address-size. 32-bit mode actually made x86 *harder* to decode by adding more variable-length stuff (like a SIB byte in addressing modes). AMD64 also made minimal changes so it could share decoder (and execution unit) transistors between modes, again losing the opportunity to "clean up" x86. e.g. shifts still leave FLAGS unmodified unmodified when count=0, which is why `shl eax, cl` costs 3 uops on Skylake. – Peter Cordes Jan 07 '18 at 05:23
  • @Jeff: in summary, yes much of x86's legacy baggage dates back to 8086 and 80386. See also [Stop the instruction set war](http://www.agner.org/optimize/blog/read.php?i=25) on Agner Fog's blog: further extensions to the ISA were often short-sighted and made later extensions even more awkward. – Peter Cordes Jan 07 '18 at 05:26
  • @PeterCordes 68000, arm, and I thing x86 have post-increment. In-fact arm has pre/post-increment/decrement. And arm does it in one clock cycle (even on the oldest 3 stage pipeline models). Arm, 68000, mips, sparc, Power, Alpha, etc …. all save instructions by having many multi-purpose registers and clean architecture. They don't wast time moving stuff between registers. – ctrl-alt-delor Jan 07 '18 at 15:50
  • 1
    ARM branch prediction isn't better than x86 in general. That's a crazy claim. Most x86 chips uses longer pipelines and thus require very good branch prediction for good performance, so they use algorithms like TAGE with fairly a large BTB. https://danluu.com/branch-prediction/. Modern ARM CPUs mostly use something more sophisticated than static prediction, too. Simple static prediction (backward = predict taken) isn't better than what x86 does! If it was, x86 would just do that >.<. and="" arm="" as="" better="" branch="" chips="" clocked="" high="" if="" more="" need="" pipeline="" prediction...="" stages="" they="" x86=""> – Peter Cordes Dec 31 '18 at 03:35
  • @PeterCordes you are correct, static prediction is terrible. The point is, that because of better architecture, the ARM has less pip-line flushes, even with this predictor. (and yes they did improve on it.) – ctrl-alt-delor Dec 31 '18 at 15:25
  • 1
    In 2019 Fujitsu will bring out a fast ARM processor. "_Fujitsu A64FX is the new fastest Arm processor in the world, built on 7nm it has 2.7 TFLOPS performance per chip suitable for high-end HPC and AI, they aim to create with it the world’s fastest supercomputer with it by 2021._" - https://insidehpc.com/2019/01/video-fujitsu-post-k-supercomputer-feature-worlds-fastest-arm-processor/ – Thorbjørn Ravn Andersen Apr 02 '19 at 06:33
17

The ARM architecture was originally designed for Acorn personal computers (See Acorn Archimedes, circa 1987, and RiscPC), which were just as much keyboard-based personal computers as were x86 based IBM PC models. Only later ARM implementations were primarily targeted at the mobile and embedded market segment.

Originally, simple RISC CPUs of roughly equivalent performance could be designed by much smaller engineering teams (see Berkeley RISC) than those working on the x86 development at Intel.

But, nowadays, the fastest ARM chips have very complex multi-issue out-of-order instruction dispatch units designed by large engineering teams, and x86 cores may have something like a RISC core fed by an instruction translation unit.

So, any current differences between the two architectures are more related to the specific market needs of the product niches that the development teams are targeting. (Random opinion: ARM probably makes more in license fees from embedded applications that tend to be far more power and cost constrained. And Intel needs to maintain a performance edge in PCs and servers for their profit margins. Thus you see differing implementation optimizations.)

hotpaw2
  • 68,014
  • 12
  • 81
  • 143
  • There are still massive architectural differences. However intel have done a wonderful job and invested a shed load of money, to make there poorly architected CPU run very well (one wonders what could have been done, if all this effort was put into a well architected CPU). – ctrl-alt-delor Dec 29 '18 at 16:41