Playing with ruby's new JIT: MJIT
Just-in-time is an illusion. Albert Einstein
This week, Takashi Kokubun (@k0kubun) merged the first implementation of a JIT compiler in MRI ruby.
k0kubun @k0kubunI've just committed the initial JIT compiler for Ruby. It's not still so fast yet (especially it's performing badly with Rails for now), but we have much time to improve it until Ruby 2.6 (or 3.0) release. github.com/ruby/ruby/commit/ed935a…
3:27 AM - 4 Feb 2018
As the commit explains, this is still early days for JIT in MRI ruby. It’s not yet ready to make Rails faster, and it’s slower right now than some of the earlier prototypes, but it’s here.
I’m really excited about this. Ruby has an (only partly deserved) reputation for being slow. A JIT has the potential to solve this.
Despite its experimental state, I couldn’t wait to give it a go.
I cloned down MRI trunk, built it, and installed it to my ~/.rubies
directory
then switched to the new version.
I tested it out with my Advent of Code 2017 day 15 solution (spoilers below).
This should be an ideal candidate for ruby’s JIT. It’s not calling out to other expensive methods. It’s just doing math and should be mostly bound by interpreter speed.
It works! We went to 6.1
seconds from 8.3
seconds seconds just by enabling JIT.
MJIT internals
I’m not sure, but I believe that MJIT’s approach to be somewhat unconventional. From comments in mjit.c:
We utilize widely used C compilers (GCC and LLVM Clang) to implement MJIT. We feed them a C code generated from ISEQ. The industrial C compilers are slower than regular JIT engines. Generated code performance of the used C compilers has a higher priority over the compilation speed.
MJIT takes a block of ruby’s YARV bytecode and converts it into what is basically an inlined version of the C code it would have run when interpreting it.
In some ways, this is the same as what other JITs do: they compile bytecode into machine code at runtime. I don’t know of another JIT which so directly shells out to an off-the-shelf C compiler.
I think I like it.
It works, for one thing. It feels nice and UNIX-y. Best of all, it makes the JIT very inspectable.
Let’s see what exactly it’s doing
MJIT has taken our “hot” block, the inner loop of our calculate function, and
- Converted it to C
- Written that C to
/tmp/_ruby_mjitp18966u0.c
- Used GCC (or clang) to compile that to
/tmp/_ruby_mjitp18966u0.so
- Dynamically loaded that shared library to run it
Side note: The JIT run took 105% CPU, even though the ruby code is single threaded. A second thread runs the JIT compiler. That thread, and the GCC process it spawned, are responsible for that extra 5% which would have run on a different CPU core. Neat!
Let’s take a look at the generated C code (full file as gist):
The comments correspond to the YARV instructions being compiled.
Each instruction manipulates the stack, program counter, and the rest of ruby’s VM in exactly the way the interpreter, vm_exec_core
would have.
The section I’ve included here gets all the way through the first multiplication:
- Get the local variable
a
, which is pushed on the stack - Push our multiplicand,
16807
, on the stack (represented by itsobject_id
0x834f
) - Call
vm_opt_mult
with the 2 values on the stack.
One great thing about this just how good C compilers are.
It will do its best to inline the methods from ruby’s internals (like vm_get_ep
and vm_opt_mult
) being called here.
It will avoid assignments to the stack and other memory locations if it can infer that they aren’t needed, or if it can just assign the final written value.
So it should do a reasonable job even with this simple implementation.
Changing the ruby code to JIT better?
It’s way to early to change any ruby code to be more JIT friendly. MJIT could look totally different in a few months.
But as long as its just for fun, I think I will indulge.
With our original ruby code, because #times
and #count
are written in C, and are separate method calls, they aren’t being optimized together with our JIT’d internal block.
By writing less idiomatic ruby, we can JIT the outside of the loop as well.
Without JIT:
With JIT:
What happened?
Not only is this slower than the non-JIT version. It’s slower than our original code, even though the while loop is faster when not using JIT.
MJIT didn’t know to optimize this method. It’s using a pretty naive heuristic to determine what methods to optimize (which could change in the future).
MJIT optimizes functions which are called more than 5 times, but we’re only calling calculate twice. Previously it knew to optimize the inner block before because it was being called millions of times.
It would eventually figure this out on a long running process like a web server. I suspect even 5 times is quite low and should eventually be raised.
As long as this is just for fun, we can cheat and telegraph to MJIT that we would like it very much if it compiled our method.
The next problem is that MJIT is asynchronous and runs in another thread. It will start just-in-time compiling after the running calculate the first 5 times, but won’t be finished before the first real call to calculate
.
It’s not just-in-time enough! It’s just-too-late!
There’s a command line option --jit-wait
to work around that.
It will make MJIT synchronous and finish that compilation before we move on.
In most cases it will hurt performance instead of helping, but for this simple script it does exactly what we need.
It’s way too soon to do this for real code, but I hope it’s a first glimpse of what performance in ruby is going to look like.
I’m really excited to see where MJIT goes in the future.
Further reading
- ruby
- Takashi Kokubun’s RubyConf 2017 Talk and (slides)
- Takashi Kokubun’s YARV-MJIT, the prototype for the merged implementation (with some features still outstanding?)
- Vladimir N. Makarov’s MJIT, the inspiration for YARV-MJIT