There was a along lived myth that claimed “java is slow”. Over the years, we’ve seen Java going faster, and occupying the enterprise, eventually, the mobile space. Running the desktop Hotspot JVM sure is an overkill for the mobile devices, and that brings the need of optimizations that could make the VM lighter and efficient for low memory/CPU footprint devices.
Android is a Linux based OS with 2.6.x kernel, stripped down to handle most tasks pretty well. It uses native open source C libraries that have powered Linux machines for years. All the basic OS operations like I/O, memory management, and so on, are handled by the native stripped-down Linux kernel. On top of it all, the application layer is exposed in Java APIs. These Java APIs internally use JNI calls to native libraries of the Operating system — The benefit being: Your application code always remain platform independent no matter which processor architecture (x86, ARM, etc) it runs on. As Google plans to take android from Mobiles to Android TVs, Tablets, and Cars, and what not, you won’t have to worry about porting your application (apart from screen resolutions).
Dalvik VM was developed to accommodate very efficient java runtime for devices having low memory footprint. It’s brilliantly smart in memory management and can proactively work on phones having memories as low as 32mb (or even lower with Older Android releases).
Dalvik doesn’t run the Sun Java Byte code, instead it runs DEX files a.k.a Dalvik executables. Java byte code is recompiled into Dex, every time ADT plugin (installed in eclipse) detects a change in the .class files. This dex then runs in the Dalvik with Just in time compilation. Why Just in time?
JIT has lots of potential merits which makes it a sure shot winner in VM based languages. A normal compiled application has a pre-compiled binary. Whereas, the interpreted languages do this dynamically as the code execution is requested. JIT is hybrid of both.
Why JIT is Faster: Indepth
Much of the “heavy lifting” of parsing the original source code and performing basic optimization is often handled at compile time, prior to deployment. Obviously, compilation from bytecode to machine code is much faster than compiling from source.
At the time the bytecode is run, the just-in-time compiler will come into action and readily compile some or all of it into native machine code for better performance. This is purely selective, it can be done per-file, per-function or even on any arbitrary code segment; the code can be compiled when it is about to be executed.
Lets assume that a pre-compiled binary runs, say, on all x86 machines. This binary is then made to run on two different machines: Intel Processor with MMX on Windows, and then on AMD with some SSE2 or SSSE3 capability on a Linux. The x86 binary is not optimized to run better on one than the other in any way. The purpose of the binary is to be “compatible” with x86 architecture more than being performing for the processor and platform, and that’s what makes the difference.
At the runtime JIT can perform a large number of optimizations that will efficiently use the hardware to give out the best performance. Primarily there are three kinds of optimizations that we are interested in:
- Instruction set optimizations
- Resource Reuse
- Runtime Analysis & Optimizations
- Superior Memory Management
1. Instruction set optimizations: With JIT active on a platform, translation from C (underlying language of Java) to Machine code (instruction set) would happen most optimally. e.g. The application that does lots of mathematical operations with one of them being, say, subsequent Multiply and Add. There might be an instruction set available for performing “Multiply and Add” within single clock cycle, JIT if aware, can use it thereby, reducing clock cycles for execution of that task.
2. Resource Reuse: Resource re-use can be done in multiple ways. It can be something as simple as re-use of String objects, to reuse of an I/O by dynamically compiling code to reuse an existing stream/connection instead of creating a new one, when possible.
In JIT, translations occur continuously with caching of translated code to minimize performance degradation. Apart from performance, it also offers other advantages over statically compiled code, such as handling of late-bound data types and the ability to enforce security guarantees.
3. Runtime Analysis & Optimizations
The system is able to collect statistics about how the program is actually running in the environment it is in, and it can rearrange and recompile for optimum performance. This is a complex to implement Java has done it nicely, but Dalvik VM in Android 2.2 Froyo is way ahead in leveraging performance boosts.
The system can do global code optimizations (In most cases: inlining compilation of library functions) without losing the advantages of dynamic linking and that too without the overheads inherent to static compilers and linkers.
4. Superior Memory Management
A bytecode system can more easily rearrange memory for better cache utilization. apart from performance caches, memory allocation is less fragmented, more reusable.
The Android JIT is designed to speed up with the execution of the areas of code that touch more mathematical computations. e.g. In a typical OpenGL Game/graphics, there could be use of a large number of integer and flotating point mathematical calculations that can go crazy slow down under normal VMs.
Dalvik would actually take advantage of the JIT environment which can boost the productivity by consuming as little as 100k of RAM. With each Android process, JIT will typically only use another 100k or so from the RAM. On the current generation of Android phones, device users won’t even notice this additional memory usage.
Performance Varies, But Android Dalvik Rocks
Many previous JIT implementations react slowly, delivering performance improvements only after a long warm up period.
This delay is due to the time taken to load and compile the bytecode.The delay called “startup time delay” Evidently, the more optimization JIT performs, the better code it will generate, but the initial delay will also increase. A JIT compiler therefore has to make a trade-off between the compilation time and the quality of the code it hopes to generate. However, it seems that much of the startup time is sometimes due to IO-bound operations rather than JIT compilation (for example, the rt.jar class data file for the Java Virtual Machine is 40 MB and the JVM must seek a lot of data in this huge file). Dalvik does it much more efficiently. Dalvik VM being lightweight has stripped down version of the desktop JVM which loads selective and stripped down runtime jars.
In certain implementations the warmup time can be extreme: minutes or even hours before the code is fully up to speed. Dalvik JIT rather reacts quickly, seconds after you hit the App Icon on your favorite game, you are already benefiting from JIT performance improvements.
I remember the time when Android 2.1 was getting ready for release, JIT was all in there ready for the showtime. But for some unknown reasons, it was disabled, just a switch away from 2x – 4x times performance boost. But question is why wasn’t it unveiled at that moment. I could be wrong, though was in good-enough shape, it seemed like more of a marketing strategy.
Whatever the case is, Performance improvements always are of the order of “Observable” index. In gaming you can achieve anywhere between 2x – 5x performance boost, on contrary, light weight apps would not see more than 10% difference.
Dalvik VM, though first built in a closed system, has all the richness of the Open Source. It has evolved a lot since the time it had been open sourced nourishing Android to the peek of the smartphone industry.
I love iPhone for what it has done to the smartphone industry. But, I agree even more with what Vic Gundotra said at the Google IO:
“If Google didn’t act, we face a draconian future. One man, one company, one device would control our future, If you believe in openness and choice, welcome to Android.”
Update: Google I/O Video for Android 2.2 Froyo Dalvik JIT now available: