Trace-based
Just-in-Time Type Specialization for Dynamic
Languages
Andreas
Gal
∗
+
,
Brendan Eich
∗
,
Mike Shaver
∗
,
David Anderson
∗
,
David Mandelin
∗
,
Mohammad
R. Haghighat
$
,
Blake Kaplan
∗
,
Graydon Hoare
∗
,
Boris Zbarsky
∗
,
Jason Orendorff
∗
,
Jesse
Ruderman
∗
,
Edwin Smith
#
,
Rick Reitmaier
#
,
Michael Bebenita
+
,
Mason Chang
+#
,
Michael Franz
+
Mozilla
Corporation
∗
{
gal,brendan,shaver,danderson,dmandelin,mrbkap,graydon,bz,jorendorff,jruderman
}
@mozilla.com
Adobe
Corporation
#
{
edwsmith,rreitmai
}
@adobe.com
Intel
Corporation
$
{
mohammad.r.haghighat
}
@intel.com
University
of California, Irvine
+
{
mbebenit,changm,franz
}
@uci.edu
Abstract
Dynamic
languages such as JavaScript are more difficult to com-
pile
than statically typed ones. Since no concrete type information
is
available, traditional compilers need to emit generic code that
can
handle
all possible type combinations at runtime. We present an al-
ternative
compilation technique for dynamically-typed languages
that
identifies frequently executed loop traces at run-time and then
generates
machine code on the fly that is specialized for the ac-
tual
dynamic types occurring on each path through the loop. Our
method
provides cheap inter-procedural type specialization, and an
elegant
and efficient way of incrementally compiling lazily discov-
ered
alternative paths through nested loops. We have implemented
a
dynamic compiler for JavaScript based on our technique and we
have
measured speedups of 10x and more for certain benchmark
programs.
Categories
and Subject Descriptors
D.3.4
[
Programming
Lan-
guages
]:
Processors —
Incremental
compilers, code generation
.
General
Terms
Design,
Experimentation, Measurement, Perfor-
mance.
Keywords
JavaScript,
just-in-time compilation, trace trees.
1.
Introduction
Dynamic
languages
such
as JavaScript, Python, and Ruby, are pop-
ular
since they are expressive, accessible to non-experts, and make
deployment
as easy as distributing a source file. They are used for
small
scripts as well as for complex applications. JavaScript, for
example,
is the de facto standard for client-side web programming
Permission
to make digital or hard copies of all or part of this work for
personal or
classroom
use is granted without fee provided that copies are not made or
distributed
for
profit or commercial advantage and that copies bear this notice
and the full citation
on
the first page. To copy otherwise, to republish, to post on
servers or to redistribute
to
lists, requires prior specific permission and/or a fee.
PLDI’09,
June
15–20, 2009, Dublin, Ireland.
Copyright
c
©
2009
ACM 978-1-60558-392-1/09/06...$5.00
and
is used for the application logic of browser-based productivity
applications
such as Google Mail, Google Docs and Zimbra Col-
laboration
Suite. In this domain, in order to provide a fluid user
experience
and enable a new generation of applications, virtual ma-
chines
must provide a low startup time and high performance.
Compilers
for statically typed languages rely on type informa-
tion
to generate efficient machine code. In a dynamically typed pro-
gramming
language such as JavaScript, the types of expressions
may
vary at runtime. This means that the compiler can no longer
easily
transform operations into machine instructions that operate
on
one specific type. Without exact type information, the compiler
must
emit slower generalized machine code that can deal with all
potential
type combinations. While compile-time static type infer-
ence
might be able to gather type information to generate opti-
mized
machine code, traditional static analysis is very expensive
and
hence not well suited for the highly interactive environment of
a
web browser.
We
present a trace-based compilation technique for dynamic
languages
that reconciles speed of compilation with excellent per-
formance
of the generated machine code. Our system uses a mixed-
mode
execution approach: the system starts running JavaScript in a
fast-starting
bytecode interpreter. As the program runs, the system
identifies
hot
(frequently
executed) bytecode sequences, records
them,
and compiles them to fast native code. We call such a se-
quence
of instructions a
trace
.
Unlike
method-based dynamic compilers, our dynamic com-
piler
operates at the granularity of individual loops. This design
choice
is based on the expectation that programs spend most of
their
time in hot loops. Even in dynamically typed languages, we
expect
hot loops to be mostly
type-stable
,
meaning that the types of
values
are invariant. (12) For example, we would expect loop coun-
ters
that start as integers to remain integers for all iterations.
When
both
of these expectations hold, a trace-based compiler can cover
the
program execution with a small number of type-specialized, ef-
ficiently
compiled traces.
Each
compiled trace covers one path through the program with
one
mapping of values to types. When the VM executes a compiled
trace,
it cannot guarantee that the same path will be followed
or
that the same types will occur in subsequent loop iterations.
Hence,
recording and compiling a trace
speculates
that
the path and
typing
will be exactly as they were during recording for subsequent
iterations
of the loop.
Every
compiled trace contains all the
guards
(checks)
required
to
validate the speculation. If one of the guards fails (if control
flow
is different, or a value of a different type is generated), the
trace
exits. If an exit becomes hot, the VM can record a
branch
trace
starting
at the exit to cover the new path. In this way, the VM
records
a
trace
tree
covering
all the hot paths through the loop.
Nested
loops can be difficult to optimize for tracing VMs. In
a
na
̈
ıve
implementation, inner loops would become hot first, and
the
VM would start tracing there. When the inner loop exits, the
VM
would detect that a different branch was taken. The VM would
try
to record a branch trace, and find that the trace reaches not
the
inner
loop header, but the outer loop header. At this point, the VM
could
continue tracing until it reaches the inner loop header again,
thus
tracing the outer loop inside a trace tree for the inner loop.
But
this requires tracing a copy of the outer loop for every side
exit
and
type combination in the inner loop. In essence, this is a form
of
unintended tail duplication, which can easily overflow the code
cache.
Alternatively, the VM could simply stop tracing, and give up
on
ever tracing outer loops.
We
solve the nested loop problem by recording
nested
trace
trees
.
Our system traces the inner loop exactly as the na
̈
ıve
version.
The
system stops extending the inner tree when it reaches an outer
loop,
but then it starts a new trace at the outer loop header. When
the
outer loop reaches the inner loop header, the system tries to
call
the
trace tree for the inner loop. If the call succeeds, the VM
records
the
call to the inner tree as part of the outer trace and finishes
the
outer trace as normal. In this way, our system can trace any
number
of loops nested to any depth without causing excessive tail
duplication.
These
techniques allow a VM to dynamically translate a pro-
gram
to nested, type-specialized trace trees. Because traces can
cross
function call boundaries, our techniques also achieve the ef-
fects
of inlining. Because traces have no internal control-flow joins,
they
can be optimized in linear time by a simple compiler (10).
Thus,
our tracing VM efficiently performs the same kind of op-
timizations
that would require interprocedural analysis in a static
optimization
setting. This makes tracing an attractive and effective
tool
to type specialize even complex function call-rich code.
We
implemented these techniques for an existing JavaScript in-
terpreter,
SpiderMonkey. We call the resulting tracing VM
Trace-
Monkey
.
TraceMonkey supports all the JavaScript features of Spi-
derMonkey,
with a 2x-20x speedup for traceable programs.
This
paper makes the following contributions:
•
We
explain an algorithm for dynamically forming trace trees to
cover
a program, representing nested loops as nested trace trees.
•
We
explain how to speculatively generate efficient type-specialized
code
for traces from dynamic language programs.
•
We
validate our tracing techniques in an implementation based
on
the SpiderMonkey JavaScript interpreter, achieving 2x-20x
speedups
on many programs.
The
remainder of this paper is organized as follows. Section 3 is
a
general overview of trace tree based compilation we use to cap-
ture
and compile frequently executed code regions. In Section 4
we
describe our approach of covering nested loops using a num-
ber
of individual trace trees. In Section 5 we describe our trace-
compilation
based speculative type specialization approach we use
to
generate efficient machine code from recorded bytecode traces.
Our
implementation of a dynamic type-specializing compiler for
JavaScript
is described in Section 6. Related work is discussed in
Section
8. In Section 7 we evaluate our dynamic compiler based on
1
for (var i = 2; i < 100; ++i) {
2
if (!primes[i])
3
continue;
4
for (var k = i + i; i < 100; k += i)
5
primes[k] = false;
6
}
Figure
1. Sample program: sieve of Eratosthenes.
primes
is
initialized
to an array of 100
false
values
on entry to this code
snippet.
Interpret
Bytecodes
Monitor
Record
LIR
T
race
Execute
Compiled
T
race
Enter
Compiled
T
race
Compile
LIR
T
race
Leave
Compiled
T
race
loop
edge
hot
loop/exit
abort
recording
fi
nish
at
loop
header
cold/blacklisted
loop/exit
compiled
trace
ready
loop
edge with
same
types
side
exit to
existing
trace
side
exit,
no
existing trace
Overhead
Interpreting
Native
Symbol
Key
Figure
2.
State
machine describing the major activities of Trace-
Monkey
and the conditions that cause transitions to a new activ-
ity.
In the dark box, TM executes JS as compiled traces. In the
light
gray boxes, TM executes JS in the standard interpreter. White
boxes
are overhead. Thus, to maximize performance, we need to
maximize
time spent in the darkest box and minimize time spent in
the
white boxes. The best case is a loop where the types at the loop
edge
are the same as the types on entry–then TM can stay in native
code
until the loop is done.
a
set of industry benchmarks. The paper ends with conclusions in
Section
9 and an outlook on future work is presented in Section 10.
2.
Overview: Example Tracing Run
This
section provides an overview of our system by describing
how
TraceMonkey executes an example program. The example
program,
shown in Figure 1, computes the first 100 prime numbers
with
nested loops. The narrative should be read along with Figure 2,
which
describes the activities TraceMonkey performs and when it
transitions
between the loops.
TraceMonkey
always begins executing a program in the byte-
code
interpreter. Every loop back edge is a potential trace point.
When
the interpreter crosses a loop edge, TraceMonkey invokes
the
trace
monitor
,
which may decide to record or execute a native
trace.
At the start of execution, there are no compiled traces yet, so
the
trace monitor counts the number of times each loop back edge is
executed
until a loop becomes
hot
,
currently after 2 crossings. Note
that
the way our loops are compiled, the loop edge is crossed before
entering
the loop, so the second crossing occurs immediately after
the
first iteration.
Here
is the sequence of events broken down by outer loop
iteration: