24

For simplicity imagine this scenario, we have a 2-bit computer, which has a pair of 2 bit registers called r1 and r2 and only works with immediate addressing.

Lets say the bit sequence 00 means add to our cpu. Also 01 means move data to r1 and 10 means move data to r2.

So there is an Assembly Language for this computer and a Assembler, where a sample code would be written like

mov r1,1
mov r2,2
add r1,r2

Simply, when I assemble this code to native language and the file will be something like:

0101 1010 0001

the 12 bits above is the native code for:

Put decimal 1 to R1, Put decimal 2 to R2, Add the data and store in R1. 

So this is basically how a compiled code works, right?

Lets say someone implements a JVM for this architecture. In Java I will be writing code like:

int x = 1 + 2;

How exactly will JVM interpret this code? I mean eventually the same bit pattern must be passed to the cpu, isn't it? All cpu's have a number of instructions that it can understand and execute, and they are after all just some bits. Lets say the compiled Java byte-code looks something like this:

1111 1100 1001

or whatever.. Does it mean that the interpreting changes this code to 0101 1010 0001 when executing? If it is, it is already in the Native Code, so why is it said that JIT only kicks in after a number of times? If it does not convert it exactly to 0101 1010 0001, then what does it do? How does it make the cpu do the addition?

Maybe there are some mistakes in my assumptions.

I know interpreting is slow, compiled code is faster but not portable, and a virtual machine "interprets" a code, but how? I am looking for "how exactly/technically interpreting" is done. Any pointers (such as books or web pages) are welcome instead of answers as well.

Koray Tugay
  • 20,438
  • 37
  • 155
  • 276

4 Answers4

20

The CPU architecture you describe is unfortunately too restricted to make this really clear with all the intermediate steps. Instead, I will write pseudo-C and pseudo-x86-assembler, hopefully in a way that is clear without being terribly familiar with C or x86.

The compiled JVM bytecode might look something like this:

ldc 0 # push first first constant (== 1)
ldc 1 # push the second constant (== 2)
iadd # pop two integers and push their sum
istore_0 # pop result and store in local variable

The interpreter has (a binary encoding of) these instructions in an array, and an index referring to the current instruction. It also has an array of constants, and a memory region used as stack and one for local variables. Then the interpreter loop looks like this:

while (true) {
    switch(instructions[pc]) {
    case LDC:
        sp += 1; // make space for constant
        stack[sp] = constants[instructions[pc+1]];
        pc += 2; // two-byte instruction
    case IADD:
        stack[sp-1] += stack[sp]; // add to first operand
        sp -= 1; // pop other operand
        pc += 1; // one-byte instruction
    case ISTORE_0:
        locals[0] = stack[sp];
        sp -= 1; // pop
        pc += 1; // one-byte instruction
    // ... other cases ...
    }
}

This C code is compiled into machine code and run. As you can see, it's highly dynamic: It inspects each bytecode instruction each time that instruction is executed, and all values goes through the stack (i.e. RAM).

While the actual addition itself probably happens in a register, the code surrounding the addition is rather different from what a Java-to-machine code compiler would emit. Here's an excerpt from what a C compiler might turn the above into (pseudo-x86):

.ldc:
incl %esi # increment the variable pc, first half of pc += 2;
movb %ecx, program(%esi) # load byte after instruction
movl %eax, constants(,%ebx,4) # load constant from pool
incl %edi # increment sp
movl %eax, stack(,%edi,4) # write constant onto stack
incl %esi # other half of pc += 2
jmp .EndOfSwitch

.addi
movl %eax, stack(,%edi,4) # load first operand
decl %edi # sp -= 1;
addl stack(,%edi,4), %eax # add
incl %esi # pc += 1;
jmp .EndOfSwitch

You can see that the operands for the addition come from memory instead of being hardcoded, even though for the purposes of the Java program they are constant. That's because for the interpreter, they are not constant. The interpreter is compiled once and then must be able to execute all sorts of programs, without generating specialized code.

The purpose of the JIT compiler is to do just that: Generate specialized code. A JIT can analyze the ways the stack is used to transfer data, the actual values of various constants in the program, and the sequence of calculations performed, to generate code that more efficiently does the same thing. In our example program, it would allocate the local variable 0 to a register, replace the access to the constant table with moving constants into registers (movl %eax, $1), and redirect the stack accesses to the right machine registers. Ignoring a few more optimizations (copy propagation, constant folding and dead code elimination) that would normally be done, it might end up with code like this:

movl %ebx, $1 # ldc 0
movl %ecx, $2 # ldc 1
movl %eax, %ebx # (1/2) addi
addl %eax, %ecx # (2/2) addi
# no istore_0, local variable 0 == %eax, so we're done
  • Can we say that in your example JIT has punched in for adding the values but the storing is still interpreted? Great answer btw, thanks. – Koray Tugay Jan 25 '15 at 11:05
  • @KorayTugay I wouldn't say "the storing is still interpreted". The location of those stores has changed, the way in which the stores happen has changed, and the JIT very clearly understood which store affected which piece of memory. The register shuffling is slightly sub-optimal (after further optimization, the first instruction would use `eax` instead of `ebx` and the third instruction would be removed) but it's very clearly compiled. –  Jan 25 '15 at 11:12
  • @Koray Tugay: no, actually, storing means modifying the heap. So interpreter and compiled code may have an entirely different way of handling local variables and stack as long as they agree about the heap. For this simple example, if HotSpot kicks in, it will find out that the heap is never modified and the result is not returned, so it will remove the entire computation. The times when compiled code reflected the associated bytecode 1:1 were twenty years ago… – Holger Jan 26 '15 at 09:49
  • @Holger well I was trying to keep the question simple really – Koray Tugay Jan 26 '15 at 17:37
2

One of the important steps in Java is that the compiler first translates the .java code into a .class file, which contains the Java bytecode. This is useful, as you can take .class files and run them on any machine that understands this intermediate language, by then translating it on the spot line-by-line, or chunk-by-chunk. This is one of the most important functions of the java compiler + interpreter. You can directly compile Java source code to native binary, but this negates the idea of writing the original code once and being able to run it anywhere. This is because the compiled native binary code will only run on the same hardware/OS architecture that it was compiled for. If you want to run it on another architecture, you'd have to recompile the source on that one. With the compilation to the intermediate-level bytecode, you don't need to drag around the source code, but the bytecode. It's a different issue, as you now need a JVM that can interpret and run the bytecode. As such, compiling to the intermediate-level bytecode, which the interpreter then runs, is an integral part of the process.

As for the actual realtime running of code: yes, the JVM will eventually interpret/run some binary code that may or may not be identical to natively compiled code. And in a one-line example, they may seem superficially the same. But the interpret typically doesn't precompile everything, but goes through the bytecode and translates to binary line-by-line or chunk-by-chunk. There are pros and cons to this (compared to natively compiled code, e.g. C and C compilers) and lots of resources online to read up further on. See my answer here, or this, or this one.

Community
  • 1
  • 1
Martin Dinov
  • 8,190
  • 2
  • 26
  • 37
  • "You can directly compile Java source code to native binary, but this negates the idea of writing the original code once and being able to run it anywhere." How? – Koray Tugay Jan 25 '15 at 10:32
  • If you directly compile for a specific architecture, you can only run the compiled code on that architecture. To run the code on another architecture, you'd have to recompile for that one. Maybe I should clarify this in the answer. – Martin Dinov Jan 25 '15 at 10:33
  • Thanks for detailed answer but what I am actually asking is still not clear. You are saying: "As such, compiling to the intermediate-level bytecode, which the interpreter then runs, is an integral part of the process." This is what I am trying to learn. What kind of process? What is done? Can you give an example with this imaginary architecture? – Koray Tugay Jan 25 '15 at 10:39
  • The details will depend on the exact JVM, if some kind of JIT is implemented (which will always be the case nowadays), etc. Typically, the interpreter in JVM reads bytecode line by line, running that line directly. How? By modifying the JVM's internal state (remember that the JVM is a type of virtual machine). Internally, the JVM may call on a JIT compiler to compile bytecode to native code which it then stores for the future and can run faster. In any case, the details are JVM-implementation specific. – Martin Dinov Jan 25 '15 at 10:46
2

Not all computers have the same instruction set. Java bytecode is a kind of Esperanto - an artificial language to improve communication. The Java VM translates the universal Java bytecode to the instruction set of the computer it runs on.

So how does JIT figure in here? The main purpose of the JIT compiler is optimization. There are often different ways to translate a certain piece of bytecode into the target machine code. The most performance-ideal translation is often non-obvious because it might depend on the data. There are also limits to how far a program can analyze an algorithm without executing it - the halting problem is a well-known such limitation but not the only one. So what the JIT compiler does is try different possible translations and measure how fast they are executed with the real-world data the program processes. So it takes a number of executions until the JIT compiler found the perfect translation.

Philipp
  • 60,671
  • 9
  • 107
  • 141
  • So basically, every time the code is compiled to native code for the given architecture. But once the fastest version is found, then JIT compiles it for one final time? – Koray Tugay Jan 25 '15 at 10:31
  • @KorayTugay it already has a compiled version, so why would it compile it again? – Philipp Jan 25 '15 at 10:32
  • Well you say "So it takes a number of executions until the JIT compiler found the perfect translation.". And execution means bytecode to native code isn't it, which is called compiling? – Koray Tugay Jan 25 '15 at 10:37
  • Compiling code will have some CPU overhead so to avoid compiling everything, JVM looks for frequent code paths, identifies them as hot spots and only compiles those things. To do that, a method has to executed 10000 times on a server JVM and 1500 times on a client JVM, not sure about the exact no on client vm, its somewhere around that value – Arkantos Jan 27 '15 at 16:51
  • Also now that JVM knows how the code is most likely to execute, it does some decent optimizations like monomorphic calls to static resolution, bimorphic calls dispatch with some branch condition, method inlining, constant folding, Escape Analysis, basically all the advantages a compiler has over an interpreter plus some cool run-time optimizations :) – Arkantos Jan 27 '15 at 17:08
2

Simplifying, interpreter is a infinite loop with a giant switch inside. It reads Java byte code (or some internal representation) and emulates a CPU executing it. This way the real CPU executes the interpreter code, which emulates the virtual CPU. This is painfully slow. Single virtual instruction adding two numbers requires three function calls and many other operations. Single virtual instruction takes a couple of real instructions to execute. This is also less memory efficient as you have both real and emulated stack, registers and instruction pointers.

while(true) {
    Operation op = methodByteCode.get(instructionPointer);
    switch(op) {
        case ADD:
            stack.pushInt(stack.popInt() + stack.popInt())
            instructionPointer++;
            break;
        case STORE:
            memory.set(stack.popInt(), stack.popInt())
            instructionPointer++;
            break;
        ...

    }
}

When some method is interpreted multiple times, JIT compiler kicks in. It will read all virtual instructions and generate one or more native instructions which does the same. Here I'm generating string with text assembly which would require additional assembly to native binary conversions.

for(Operation op : methodByteCode) {
    switch(op) {
        case ADD:
            compiledCode += "popi r1"
            compiledCode += "popi r2"
            compiledCode += "addi r1, r2, r3"
            compiledCode += "pushi r3"
            break;
        case STORE:
            compiledCode += "popi r1"
            compiledCode += "storei r1"
            break;
        ...

    }
}

After native code is generated, JVM will copy it somewhere, mark this region as executable and instruct the interpreter to invoke it instead of interpreting byte code next time this method is invoked. Single virtual instruction might still take more than one native instruction but this will be nearly as fast as ahead of time compilation to native code (like in C or C++). Compilation is usually much slower than interpreting, but has to be done only once and only for chosen methods.

Piotr Praszmo
  • 16,785
  • 1
  • 51
  • 60
  • 1
    But how does it emulate? – Koray Tugay Jan 26 '15 at 17:37
  • 1
    Look at the first code snippet. Instead of creating the CPU in silicone using logic gates and flip-flops, it's done in high level programming language using control structures and variables. – Piotr Praszmo Jan 26 '15 at 17:49
  • 3
    Based on your first sentence I was going to upvote, but then I came across 'requires three function calls', which is simpy untrue. You're just assuming that the stack operations are function calls. In a real interpreter they wouldn't be. Interpretive overhead need only consist of the fetch and dispatch cycle. Everything else is or should be the same. – user207421 Jan 27 '15 at 11:16