Thanks to a recent talk by Heroku’s Joe Kutner, I got inspired to learn how to program on the JVM. I’m sure you’re thinking I’ve been doing just that for years with Java, Groovy, and Scala. However, I’ve yet to write straight to bytecode. Joe’s aforementioned talk covered the basics of how the JVM works under the hood of our languages. He introduced the JVM architecture and tools for reading and writing our own bytecode. I have been writing a lisp compiler lately to explore this subject, and I’ve decided it is time to kick off a blog series on the topic of JVM bytecode.
What is bytecode? It’s the *.class
files emitted by your compiler. It is a low-level binary format much like native machine code. The difference being that the JVM itself is another layer of software between your bytecode and the metal of your machine. The point to emphasize here is in order to view the bytecode directly, you’ve gotta use a hex editor. Well, that’s quite difficult to understand of course. What we want is something that is a little higher-level than the bytes, much like assembly language. There is no assembly for the JVM, tho. The languages to straight to the bytes. Fortunately the JDK includes javap
which will print out the bytecode of a class file in a human-readable format. Invoke javap -v -c <classname>
in the root directory of your class files to view it.
For example, compile a typical Hello World program in java then view the javap
output. This java class…
public class Hello { public static void main(String[] args) { System.out.println("Hello Java!"); } }
…produces the following javap
output…
>javap -v -c Hello Compiled from "Hello.java" public class Hello extends java.lang.Object SourceFile: "Hello.java" minor version: 0 major version: 50 Constant pool: const #1 = Method #6.#15; // java/lang/Object."<init>":()V const #2 = Field #16.#17; // java/lang/System.out:Ljava/io/PrintStream; const #3 = String #18; // Hello Java! const #4 = Method #19.#20; // java/io/PrintStream.println:(Ljava/lang/String;)V const #5 = class #21; // Hello const #6 = class #22; // java/lang/Object const #7 = Asciz <init>; const #8 = Asciz ()V; const #9 = Asciz Code; const #10 = Asciz LineNumberTable; const #11 = Asciz main; const #12 = Asciz ([Ljava/lang/String;)V; const #13 = Asciz SourceFile; const #14 = Asciz Hello.java; const #15 = NameAndType #7:#8;// "<init>":()V const #16 = class #23; // java/lang/System const #17 = NameAndType #24:#25;// out:Ljava/io/PrintStream; const #18 = Asciz Hello Java!; const #19 = class #26; // java/io/PrintStream const #20 = NameAndType #27:#28;// println:(Ljava/lang/String;)V const #21 = Asciz Hello; const #22 = Asciz java/lang/Object; const #23 = Asciz java/lang/System; const #24 = Asciz out; const #25 = Asciz Ljava/io/PrintStream;; const #26 = Asciz java/io/PrintStream; const #27 = Asciz println; const #28 = Asciz (Ljava/lang/String;)V; { public Hello(); Code: Stack=1, Locals=1, Args_size=1 0: aload_0 1: invokespecial #1; //Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 1: 0 public static void main(java.lang.String[]); Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; //String Hello Java! 5: invokevirtual #4; //Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: return LineNumberTable: line 3: 0 line 4: 8 }
Ok, so that is definitely a lot of information. I’m not going to attempt digesting all of this information in this blog post. Frankly, I don’t understand it all, and we can begin making progress with a few observations.
First take note of the constant pool. If you have a fair amount of experience on the JVM, there is a good chance you are already aware that any String
literals in your code are stored as constants for the entirety of the program. We can find our "Hello Java!"
in this table as const #3
and const #18
. I have no idea why it is split this way (perhaps a const for the reference and one for the char array?). I’ll leave that for the reader to explain to me. :)
Next notice the code below the constant pool. These are the byte code mnemonics of the program. Let’s focus on the main
method where we see four things happen:
- Get the static field
out
fromjava.lang.System
. - Do something cryptic with our
"Hello Java!"
string. - Invoke the method
println
onjava.io.PrintStream
with an argument of typejava.lang.String
. - Finally, we return from the method.
Notice that the invokevirtual
has neither System.out
or "Hello Java!"
specified. That is because in the previous two steps we put those values on a stack. The JVM is a stack-architecture machine. Every operation we have here (except return
) operates on this stack. First we push the target object onto the stack, namely System.out
. Then we load the argument string onto the stack from the run-time constant pool with ldc
. Finally we invoke the appropriate method on the target object. This invocation pops those two items off the stack and performs the method.
Don’t confuse this stack with the call stack that you are familiar with. Each frame of the call stack has this stack I am referring to as its playground where it can push and pop values to perform all of the duties of the running program.
The main takeaways are (1) javap
for viewing byte code, (2) the constant pool, and (3) the stack architecture. Stay tuned for the next installment where I will introduce some tooling to assist us in writing our own Hello World straight to JVM instructions.
Leave a reply below, or send me a tweet.
Image may be NSFW.
Clik here to view.
Clik here to view.
