I'd like to open this blog with a simple but extremely crude optimisation method: counting bytecode.
Presumably you are aware that Java (the language) compiles to Java bytecode. If you've never seen it, a human-readable form of the bytecode looks something like this:
0: ICONST_0 1: ISTORE 1 4: IINC 1 1 6: ILOAD 1 7: IFNE 11 9: ALOAD 0 10: INVOKESPECIAL run()V 11: RETURN
If you take a bit of time, you can learn to read this code. It is not recommended to write bytecode manually though. :)
Now, take a simple routine in an application that would be execute often, like firing an event. By using freely available Java tools we can look closely at its bytecode and get some ideas on how to optimise it...
class Alarm {
private Runnable[] listeners;
private int count;
public void fireAlarm() {
for (int i = 0; i < count; i++)
listeners[i].run();
}
};
0: iconst_0 get the constant 0
1: istore_1 assign it to “i”
2: goto 19 jump to the comparison
5: aload_0 get the value of “this”
6: getfield #27; get the value of “listeners”
9: iload_1 get the value of “i”
10: aaload get the value of “listeners[i]”
11: invokeinterface #41,1; call run()
16: iinc 1, 1 increment “i”
19: iload_1 get the value of “i”
20: aload_0 get the value of “this”
21: getfield #23; get the value of “count”
24: if_icmplt 5 perform comparison and loop
27: return
Looking at the above bytecode, we can see that there are 10 operations performed in every loop iteration. No matter how clever the JVM becomes, it can be worthwhile to reduce the number of operations within oft-executed loops.
In this case, we could try moving all those aload_0 and related operations outside of the loop. This would shorten our loop by 2 bytecode operations, though it does make the entire method longer...
class Alarm {
private Runnable[] listeners;
private int count;
public void fireAlarm() {
Runnable[] listeners = this.listeners;
int count = this.count;
for (int i = 0; i < count; i++)
listeners[i].run();
}
};
0: aload_0 get value of “this”
1: getfield #27; get value of “listeners”
4: astore_1 assign to local “listeners”
5: aload_0 get value of “this”
6: getfield #23; get value of “count”
9: istore_2 assign to local “count”
10: iconst_0 get the constant 0
11: istore_3 assign to “i”
12: goto 26 jump to the comparison
15: aload_1 get value of local “listeners”
16: iload_3 get value of “i”
17: aaload get value of “listeners[i]”
18: invokeinterface #41,1; call run()
23: iinc 3, 1 increment value of “i”
26: iload_3 get value of “i”
27: iload_2 get value of local “count”
28: if_icmplt 15 perform comparison and loop
31: return
Would you believe that such a small change can improve the performance of this routine by up to 20 percent? I kid you not. Simply forcing member variables to be copied on to the stack can significantly improve performance.
Of course, the listeners themselves will consume most of the execution time, so you will not see an immediate 20% improvement in performance from this small change. However, perhaps other pieces of code could be altered to execute more inefficiently... Indeed, the main lessons to take from this are:
- Think about the bytecode behind your Java code. If you can think of ways to shorten frequently-execude bytecode, you're probably on the right track. This applies for all code, not just listeners and loops!
- Test the performance of your code, and derive 'best practices' for your coding. In the above example, we learned that 'changing member variables to local variables' can provide a performance improvement. Use it in the future!
Over the next few weeks, I will post some of the techniques I have learned to improve performance of my own Java code. Indeed, there are ways to make this event firing loop execute much faster...
No comments:
Post a Comment