79
I Use This!
High Activity

News

Analyzed about 1 hour ago. based on code collected 1 day ago.
Posted almost 17 years ago by Charles Oliver Nutter
Compilers are hard. But not so hard as people would have you believe.I've committed an update that installs a CallAdapter for every compiled call site. CallAdapter is basically a small object that stores the following:method namemethod indexcall type ... [More] (normal, functional, variable)As well as providing overloaded call() implementations for 1, 2, 3, n arguments and block or no block. The basic goal with this class is to provide a call adapter (heh) that makes calling a Ruby method in compiled code as similar to (and simple as) calling any Java method.The end result is that while compiled class init is a bit larger (needs to load adapters for all call sites), compiled method size has dropped substantially; in compiling bench_method_dispatch.rb, the two main tests went from 4000 and 3500 bytes of code down to 1500 and 1000 bytes (roughly). And simpler code means HotSpot has a better chance to optimize.Here's the latest numbers for the bench_method_dispatch_only test, which just measures time to call a Ruby-implemented method a bunch of times:Test interpreted: 100k loops calling self's foo 100 times 2.383000 0.000000 2.383000 ( 2.383000) 2.691000 0.000000 2.691000 ( 2.691000) 1.775000 0.000000 1.775000 ( 1.775000) 1.812000 0.000000 1.812000 ( 1.812000) 1.789000 0.000000 1.789000 ( 1.789000) 1.776000 0.000000 1.776000 ( 1.777000) 1.809000 0.000000 1.809000 ( 1.809000) 1.779000 0.000000 1.779000 ( 1.781000) 1.784000 0.000000 1.784000 ( 1.784000) 1.830000 0.000000 1.830000 ( 1.830000)And Ruby 1.8.6 for reference:Test interpreted: 100k loops calling self's foo 100 times2.160000 0.000000 2.160000 ( 2.188087)2.220000 0.010000 2.230000 ( 2.237414)2.230000 0.010000 2.240000 ( 2.248185)2.180000 0.010000 2.190000 ( 2.218540)2.240000 0.010000 2.250000 ( 2.259535)2.220000 0.010000 2.230000 ( 2.241170)2.150000 0.010000 2.160000 ( 2.178414)2.240000 0.010000 2.250000 ( 2.259772)2.260000 0.000000 2.260000 ( 2.285141)2.230000 0.010000 2.240000 ( 2.252396)Note that these are JIT numbers rather than fully precompiled numbers, so this is 100% real-world safe. Fully precompiled is just a bit faster, since there's no interpreted step or DefaultMethod wrapper to go through.I have also made a lot of progress on adapting the compiler to create stack-based methods when possible. Basically, this involved inspecting the code for anything that would require access to local variables outside the body of the call. Things like eval, closures, etc. At the moment it works well and passes all tests, but I know methods similar to gsub which modify $~ or $_ are not working right. It's disabled at the moment, pending more work, but here's the method dispatch numbers with stack-based method compilation enabled:Test interpreted: 100k loops calling self's foo 100 times 1.735000 0.000000 1.735000 ( 1.738000) 1.902000 0.000000 1.902000 ( 1.902000) 1.078000 0.000000 1.078000 ( 1.078000) 1.076000 0.000000 1.076000 ( 1.076000) 1.077000 0.000000 1.077000 ( 1.077000) 1.086000 0.000000 1.086000 ( 1.086000) 1.077000 0.000000 1.077000 ( 1.077000) 1.084000 0.000000 1.084000 ( 1.084000) 1.090000 0.000000 1.090000 ( 1.090000) 1.083000 0.000000 1.083000 ( 1.083000)It seems very promising work. I hope I'll be able to turn it on soon.Oh, and for those who always need a fib fix, here's fib with both optimizations turned on:~ $ jruby -J-server bench_fib_recursive.rb 1.258000 0.000000 1.258000 ( 1.258000) 0.990000 0.000000 0.990000 ( 0.989000) 0.925000 0.000000 0.925000 ( 0.926000) 0.927000 0.000000 0.927000 ( 0.928000) 0.924000 0.000000 0.924000 ( 0.925000) 0.923000 0.000000 0.923000 ( 0.923000) 0.927000 0.000000 0.927000 ( 0.926000) 0.928000 0.000000 0.928000 ( 0.929000)And MRI:~ $ ruby bench_fib_recursive.rb1.760000 0.010000 1.770000 ( 1.775660)1.760000 0.010000 1.770000 ( 1.776360)1.760000 0.000000 1.760000 ( 1.778413)1.760000 0.010000 1.770000 ( 1.776767)1.760000 0.010000 1.770000 ( 1.777361)1.760000 0.000000 1.760000 ( 1.782798)1.770000 0.010000 1.780000 ( 1.794562)1.760000 0.010000 1.770000 ( 1.777396)These numbers went down a bit because the call adapter is currently just generic code, and generic code that calls lots of different methods causes HotSpot to stumble a bit. The next step for the compiler is to generate custom call adapters for each call site that handle arity correctly (avoiding IRubyObject[] all the time) and call directly to the most-likely target methods. [Less]
Posted almost 17 years ago by Ola Bini
When I get bored with JRuby, I tend to go looking either at other languages or other language implementations. This happened a few days ago, and the result is what I will here document. Begin by creating a file called fib.rb:def fib(n) if n < 2 ... [More] n else fib(n - 2) fib(n - 1) endendp fib(15)The next part requires that you have a recent version of Rubinius installed:rbx compile fib.rbThis will generate fib.rbc. Next, take a recent JRuby version and run:jruby -R fib.rbcAnd presto, you should see 610 printed quite soon. This is JRuby executing Rubinius bytecode. I was quite happy about how it was to get this far with the functionality. Of course, JRuby doesn't support most bytecodes yet, only those needed to execute this small example, and similar things. We are also using JRuby's internals for this, which means that Rubinius MethodContext and such are not available.Another interesting note is that running the iterative Fib algorithm like this with -J-server is actually 30% faster than MRI.This approach is fun, and I have some other similar ideas I really want to look at. The best part about it though, is that I got the chance to look at the internals of Rubinius. I hope to have more time for it eventually. Another thing I really want to do some day is implement a Jubinius, which should be a full port of the Rubinius runtime, possibly excluding Subtend. I think it could be very nice to have the Smalltalk core of Rubinius working together with Java. Of course, I don't have any time for that, so we'll see what happens in a year or two. =) Maybe someone else does it. [Less]
Posted almost 17 years ago by Ola Bini
After my last post I got several comments about evil.rb. Of course I had evil.rb in mind when doing some of it, but I also forgot to describe the two most evil methods of the JRuby module: runtime and reference. The runtime method will return the ... [More] currently executing JRuby runtime as a Java Integration, meaning you can get access to almost anything you want with it. For example, if you want to take a look at the global CacheMap (used to cache method instances):require 'jruby'JRuby::runtime.cache_mapWhoops. And that's just the beginning. Are you interested in investigating the current call frame or activation frame (DynamicScope in JRuby):require 'jruby'p JRuby::runtime.current_context.current_framea = 1p JRuby::runtime.current_context.current_scopeOf course, you can call all accessible (and some inaccessible) methods on these objects, just like if you were working with it from Java. Use the API's and take a look. You can change things without problem.And that also brings us to one of the easiest examples of evil.rb, changing the frozen flag on a Ruby object. Well, with the reference method, that's easy:require 'jruby'str = "foo"str.freezeputs str.frozen?JRuby::reference(str).setFrozen(false)puts str.frozen?JRuby::reference will return the same object sent in, wrapped in a Java Integration layer, meaning that you can inspect and modify it to your hearts like. In this way, you can get at the internals of JRuby in the same way you can using evil.rb for MRI. And I guess these features should mainly be used for looking and learning about the internals of JRuby.So, have fun and don't be evil (overtly). [Less]
Posted almost 17 years ago by Charles Oliver Nutter
One of the most attractive aspects of Ruby is the fact that it has relatively few sacred keywords. In most cases, things you'd expect to be keywords are actually methods, and you can wrap or hook their behavior and create amazing potential.One ... [More] perfect example of this is require. Because require is just a method, you can define your own version that wraps its behavior. This is exactly how RubyGems does its magic...rather than immediately calling the default require, it can modify load paths based on your installed gems, allowing for a dynamically-expanding load path and the pluggability we've all come to know and love.But all such keyword-like methods are not so well behaved. Many methods make runtime changes that are otherwise impossible to do from normal Ruby code. Most of these are on Kernel. I propose that several of these methods should actually be keywords.Update: Evan Phoenix of Rubinius (EngineYard), Wayne Kelly of Ruby.NET (Queensland University), and John Lam of IronRuby (Microsoft) have voiced their agreement on this interesting ruby-core mailing list thread. Have you shared your thoughts?Justifying KeywordsThere's a number of (in my opinion, very strong) justifications for this:Many Kernel methods manipulate runtime state in ways no other methods can. For example: local_variables requires access to the caller's variable scope; block_given? requires access to the block/iter stacks (in MRI code); eval requires access to just about everything having to do with a call; and there are others, see below.Because many of these methods manipulate normally-inaccessible runtime state, it is not possible to implement them in Ruby code. Therefore, even if someone wanted to override them (the primary reason for them to be methods) they could not duplicate their behavior in the overridden version. Overriding only destroys their utility.These methods are exactly the ones that complicate optimizing Ruby in all implementations, including Ruby 1.9, Rubinius, JRuby, Ruby.NET, and others. They confound a compiler's efforts to optimize calls by always leaving open questions about the behavior of a method. Will it need access to a heap-allocated scope? Will it save off a binding or the current call frame? No way to know for sure, since they're methods.In short, there appears to be no good reason to keep them as methods, and many reasons to make them keywords. What follows is a short list of such methods and why they ought to be keywords:*eval - requires implicit access to the caller's bindingblock_given?/iterator? - requires access to block/iter informationlocal_variables - requires access to caller's scopepublic/private/protected - requires access to current frame's visibilityThere may be others, but these are definitely the biggest offenders. The three points above were used to compile this list, but my criteria for a keyword could be the following more straightforward points. A feature should be implemented (or converted to) a keyword if it fits either of the following criteria:It manipulates runtime state in ways impossible from user-created codeIt can't be implemented in user-created code, and therefore could not reasonably be overridden or hooked to provide additional behaviorAs an alternative, if modifications could be made to ensure these methods were not overridable, Ruby implementations could safely treat them as keywords; searching for calls to "eval" in a given context would be guaranteed to mean an eval would take place in that context.What do we gain from doing all this?I can at least give a JRuby perspective. I expect others can give their perspectives.In JRuby, we could greatly optimize method invocations if, for example, we knew we could just use Java's local variables (on Java's stack) rather than always heap-allocating a scoping structure. We could also avoid allocating a frame or binding when they are not needed, just allowing Java's call frame to be "enough" for us. We can already detect if there are closures in a given context, which helps us learn that a heap-allocated scope will be necessary, but we can't safely detect eval, block_given?, etc. As a result of these methods-that-would-be-keywords, we're forced to set up and tear down every method in the most expensive manner.Other implementations&emdash;including Ruby 1.9/2.0 and Rubinius&emdash;would probably be able to make similar optimizations if we could calculate ahead of time whether these keyword operations would occur.For what it's worth, I may go ahead and implement JRuby's compiler to treat these methods as keywords, only falling back on the "method" behavior when we detect in the rest of the system that the keyword has been overridden. But that situation is far from ideal...we'd like to see all implementations adopt this behavior and so benefit equally.As an example, here's an early demonstration of the performance change in our old friend fib() when we can know ahead of time if any of these keywords are called (fib calls none of them). This example shows the performance today and the performance when we can safely just use Java local variables and scoping constructs. We could additionally omit heap-allocated frames for each call, giving a further boost.I've included Ruby 1.8.6 to provide a reference value.Current JRuby:~ $ jruby -J-server bench_fib_recursive.rb1.323000 0.000000 1.323000 ( 1.323000)1.118000 0.000000 1.118000 ( 1.119000)1.055000 0.000000 1.055000 ( 1.056000)1.054000 0.000000 1.054000 ( 1.054000)1.055000 0.000000 1.055000 ( 1.054000)1.055000 0.000000 1.055000 ( 1.055000)1.055000 0.000000 1.055000 ( 1.055000)1.049000 0.000000 1.049000 ( 1.049000)~ $ jruby -J-server bench_method_dispatch_only.rbTest interpreted: 100k loops calling self's foo 100 times3.901000 0.000000 3.901000 ( 3.901000)4.468000 0.000000 4.468000 ( 4.468000)2.446000 0.000000 2.446000 ( 2.446000)2.400000 0.000000 2.400000 ( 2.400000)2.423000 0.000000 2.423000 ( 2.423000)2.397000 0.000000 2.397000 ( 2.397000)2.399000 0.000000 2.399000 ( 2.399000)2.401000 0.000000 2.401000 ( 2.401000)2.427000 0.000000 2.427000 ( 2.428000)2.403000 0.000000 2.403000 ( 2.403000)Using Java's local variables instead of a heap-allocated scope:~ $ jruby -J-server bench_fib_recursive.rb2.360000 0.000000 2.360000 ( 2.360000)0.818000 0.000000 0.818000 ( 0.818000)0.775000 0.000000 0.775000 ( 0.775000)0.773000 0.000000 0.773000 ( 0.773000)0.799000 0.000000 0.799000 ( 0.799000)0.771000 0.000000 0.771000 ( 0.771000)0.776000 0.000000 0.776000 ( 0.776000)0.770000 0.000000 0.770000 ( 0.769000)~ $ jruby -J-server bench_method_dispatch_only.rbTest interpreted: 100k loops calling self's foo 100 times3.100000 0.000000 3.100000 ( 3.100000)3.487000 0.000000 3.487000 ( 3.487000)1.705000 0.000000 1.705000 ( 1.706000)1.684000 0.000000 1.684000 ( 1.684000)1.678000 0.000000 1.678000 ( 1.678000)1.683000 0.000000 1.683000 ( 1.683000)1.679000 0.000000 1.679000 ( 1.679000)1.679000 0.000000 1.679000 ( 1.679000)1.681000 0.000000 1.681000 ( 1.681000)1.679000 0.000000 1.679000 ( 1.679000)Ruby 1.8.6:~ $ ruby bench_fib_recursive.rb 1.760000 0.010000 1.770000 ( 1.775304)1.750000 0.000000 1.750000 ( 1.770101)1.760000 0.010000 1.770000 ( 1.768833)1.750000 0.010000 1.760000 ( 1.782908)1.750000 0.010000 1.760000 ( 1.774193)1.750000 0.000000 1.750000 ( 1.766951)1.750000 0.010000 1.760000 ( 1.777814)1.750000 0.010000 1.760000 ( 1.782449)~ $ ruby bench_method_dispatch_only.rbTest interpreted: 100k loops calling self's foo 100 times2.240000 0.000000 2.240000 ( 2.268611)2.160000 0.010000 2.170000 ( 2.187729)2.280000 0.010000 2.290000 ( 2.292342)2.210000 0.010000 2.220000 ( 2.250331)2.190000 0.010000 2.200000 ( 2.210965)2.230000 0.000000 2.230000 ( 2.260737)2.240000 0.010000 2.250000 ( 2.256210)2.150000 0.010000 2.160000 ( 2.173298)2.250000 0.010000 2.260000 ( 2.271438)2.160000 0.000000 2.160000 ( 2.183670)What do you think? Is it worth it? [Less]
Posted almost 17 years ago by Charles Oliver Nutter
In JRuby, we have a number of things we "decorate" the Java stack with for Ruby execution purposes. Put simply, we pass a bunch of extra context on the call stack for most method calls. At its most descriptive, making a method call passes the ... [More] following along:a ThreadContext object, for accessing JRuby call frames and variable scopesthe receiver objectthe metaclass for the receiver objectthe name of the methoda numeric index for the method, used for a fast dispatch mechanisman array of arguments to the methodthe type of call being performed (functional, normal, or variable)any block/closure being passed to the methodAdditionally there are a few places where we also pass the calling object, to use for visibility checks.The problem arises when compiling Ruby code into Java bytecode. The case I'm looking at involves one of our benchmarks where a local variable is accessed and "to_i" is invoked on it a large number of times:puts Benchmark.measure {a = 5;i = 0;while i < 1000000 a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; a.to_i; i = 1;end}(that's 100 accesses and calls in a 1 million loop)The block being passed to Benchmark.measure gets compiled into its own Java method on the resulting class, called something like closure0. This gets further bound into a CompiledBlock adapter which is what's eventually called when the block gets invoked.Unfortunately all the additional context and overhead required in the compiled Ruby code seems to be causing trouble for hotspot.In this case, the pieces causing the most trouble are obviously the "a.to_i" bits. I'll break that down."a" is a local variable in the same lexical scope, so we go to a local variable in closure0 that holds an array of local variable values. aload(SCOPE_INDEX)ldc([index of a])aaloadBut for Ruby purposes we must also make sure a Java null is replaced with "nil" so we have an actual Ruby object dupifnonnull(ok)popaload(NIL_INDEX) # immediately stored when method is invokedlabel(ok)So every variable access is at least seven bytecodes, since we need to access them from an object that can be shared with contained closures.Then there's the to_i call. This is where it starts to get a little ugly. to_i is basically a "toInteger" method, and in this case, calling against a Ruby Fixnum, it doesn't do anything but return "self". So it's a no-arg noop for the most part.The resulting bytecode to do the call ends up being uncomfortably long:(assumes we already have the receiver, a Fixnum, on the stack) dup # dup receiverinvokevirtual "getMetaClass"invokevirtual "getDispatcher" # a fast switch-based dispatcherswap # dispatcher under receiveraload(THREADCONTEXT)swap # threadcontext under receiverdup # dup receiver againinvokevirtual "getMetaClass" # for call purposesldc(methodName)ldc(methodIndex)getstatic(IRubyObject.EMPTY_ARRAY) # no argsldc(call type)getstatic(Block.NULL_BLOCK) # no closureinvokevirtual "Dispatcher.callMethod..."So we're looking at roughly 15 operations to do a single no-arg call. If we were processing argument lists, it would obviously be more, especially since all argument lists eventually get stuffed into an IRubyObject[]. Summed up, this means:100 a.to_i calls * (7 15 ops) = 2200 opsThat's 2200 operations to do 100 variable accesses and calls, where in Java code it would be more like 200 ops (aload invokevirtual). An order of magnitude more work being done.The closure above when run through my current compiler generates a Java method of something like 4000 bytes. That may not sound like a lot, but it seems to be hitting a limit in HotSpot that prevents it being JITed quickly (or sometimes, at all). And the size and complexity of this closure are certainly reasonable, if not common in Ruby code.There's a few questions that come out of this, and I'm looking for more ideas too.How bad is it to be generating large Java methods and how much impact does it have on HotSpot's ability to optimize?This code obviously isn't optimal (two calls to getMetaClass, for example), but the size of the callMethod signature means even optimal code will still have a lot of argument loading to do. Any ideas on how to get around this in a general way? I'm thinking my only real chance is to find simpler signatures to invoke, such as arity-specific (so there's no requirement for an array of args), avoiding passing context that usually isn't needed (an object knows its metaclass already), and reverting back to a ThreadLocal to get the ThreadContext (though that was a big bottleneck for us before...).Is the naive approach of breaking methods in two when possible "good enough"?It should be noted that HotSpot eventually does JIT this code, it's substantially faster than the current general release of Ruby 1.8. But I'm worried about the complexity of the bytecode and actively looking for ways to simplify. [Less]
Posted almost 17 years ago by Ola Bini
I have spent a few hours adding some useful features these last days. Nothing extraordinary, but things that might come in handy at one point or another. The problem with these features is that they are totally JRuby specific. That means you could ... [More] probably implement them for MRI, but noone has done it. That means that if you want to use it, beware. Further, they exploit a few tricks in the JRuby implementation, meaning it can't be implemented in pure Ruby.So, that was the disclaimer; now onto the fun stuff!Breaking encapsulation (even more)As you know, in Ruby everything is accessible in some form or another, and you can do almost everything with the metaprogramming facilities. Well, except for one small detail which I found out while working on the AR-JDBC database drivers.We have some code there which needs to be separate for each database, and it just so happens that core ActiveRecord have already implemented them in a very good way. So, what do we do? Mix in them and remove the methods we don't want? No, because ActiveRecord adapters are classes, not modules, and you can't mix in classes. There is no way to get hold of a method and add that to an unrelated other class or module. Except if you're on JRuby, of course:require 'jruby/ext'class Adef fooputs "A#foo"enddef barputs "A#bar"endendclass B;endclass C;endb = B.newb.steal_method A, :foob.fooB.new.foo rescue nil #will raise NoMethodErrorC.steal_methods A, :foo, :barC.new.fooC.new.barOf course, using this should be avoided at all costs. But it's interesting that such a powerful thing can be implemented using about 15 lines of Java code.IntrospectionJRuby parses Ruby code into an Abstract Syntax Tree. For a while now, the JRuby module have allowed you to parse a string and get the AST representation by executing:require 'jruby'JRuby.parse "puts 'hello'", 'filename.rb', falseThis returns the Java AST representation directly, using the Java Integration features. That is old. What is new is that I have added pretty inspecting, a nice YAML format and some navigation features which makes it very easy to see exactly how the AST looks. Just do an inspect or to_yaml on an AST node and you will get the relevant information.That is interesting. But what is even more nice is the ability to run and use arbitrary pieces of the AST (as long as they make sense together) and also run them:require 'jruby'ast_one = JRuby::ast_for("n = 1; n*(n 3)*(n 2)")ast_two = JRuby::ast_for("n = 42; n*(n 1)*(n 2)")p (ast_one.first.first ast_two.first[1]).runp (ast_two.first.first ast_one.first[1]).runAs you can see, I take two fragments from different code, add them together and run them. You can also see that I'm using an alias for parse here, called ast_for. That makes much more sense when using the second parse feature, which we already know from ParseTree:require 'jruby'JRuby::ast_for doputs "Hello"endWell, I guess that's all I wanted to show right now. These last small things I've added because I believe they will be highly useful for debugging JRuby code.I also have some more ideas that I want to implement. I'll keep you posted about it. [Less]
Posted almost 17 years ago by Ola Bini
Among all the features of Ruby that JRuby supports, I would say that two things take the number one place as being really inconvenient. Threads are one; making the native threading of Java match the green threading semantics of Ruby is not fun, and ... [More] it's not even possible for all edge cases. But that argument have been made several times by both me and Charles.ObjectSpace now, that is another story. The problems with OS are many. But first, let's take a quick look at the most common usage of OS; iterating over classes:ObjectSpace::each_object(Class) do |c| p c if c < Test::Unit::TestCaseendThis code is totally obvious; we iterate over all instances of Class in the system, and print an inspected version of them if the class is a subclass of Test::Unit::TestCase.Before we take a closer look at this example, let's talk quickly about how MRI and JRuby implements this functionality. In fact, having this functionality in MRI is dead easy. It's actually very simple, and there are no performance problems of having it when it's not used. The trick is that MRI just walks the heap when iterating over ObjectSpace. Since MRI can inspect the heap and stack without problems, this means that nothing special needs to be done to support this behavior. (Note that this can never be safe when using a real threading system).So, the other side of the story: how does JRuby implement it? Well, JRuby can't inspect the heap of course. So we need to keep a WeakReference to each instance of RubyObject ever created in the system. This is gross. We pay a huge penalty for managing all this stuff. Many of the larger performance benefits we have found the last year have revolved around having internal objects be smarter and not put themselves into ObjectSpace until necessary. One of my latest optimizations of regexp matching was simple to make MatchData lazy, so it only goes into OS when someone actually uses it. RDoc runs about 40% faster when ObjectSpace is turned off for JRuby.So, is it worth it? In real life, when do you need the functionality of ObjectSpace? I've seen two places that use it in code I use every day. First, Rails uses it to find generators, and secondly, Test::Unit uses it to find instances of TestCase. But the fun thing is this; the above code is almost exactly what they do; they iterate over all classes in the system and checking if they inherit from a specific base class. Isn't that a quite gross implementation? Shouldn't it be possible to do something better? Euhm, yes:module SubclassTracking def self.extended(klazz) (class <<klazz; self; end).send :attr_accessor, :subclasses (class <<klazz; self; end).send :define_method, :inherited do |clzz| klazz.subclasses << clzz super end klazz.subclasses = [] endend# Where Test::Unit::TestCase is definedTest::Unit::TestCase.extend SubclassTracking# Load all other classes# To find all subclasses and test them:Test::Unit::TestCase.subclassesI would say that this code solves the problem more elegantly and useful than ObjectSpace. There are no performance degradation due to it, and it will only effect subclasses of the class you are interested in. What's the best benefit of this? You can use the -O flag when running JRuby, and your tests and rest of the code will run much faster and use less memory.As a sidenote: I'm putting together a patch based on this to both Test::Unit and Rails. ObjectSpace is unnecessary for real code and the vision of JRuby is that you will explicitly have to turn it on to use it, instead of the other way around.Anyone have any real world examples of things you need to do with ObjectSpace? [Less]
Posted almost 17 years ago by Charles Oliver Nutter
I must apologize to my readers. I have been remiss in my blogging duties. I will be posting updates on the various events of the past month or so along with updates on JRuby progress and future events very soon. But for now, a technical divergence ... [More] after a night of hacking.--I finally understand what we should be going for in our compiled code, and how we can really kick JRuby into the next level of performance.The JVM, at least in HotSpot, gets a lot of its performance from its ability to inline code at runtime, and ultimately compile a method plus its inlined calls as a whole down to machine code. The benefit in doing this is the ability to do compiler optimizations across a much larger call path, essentially compiling all the logic for a method and its calls (and possibly their calls, ad infinatum) into a single optimized segment of machine code.HotSpot is able to do this in a two main ways:If it's obvious there's only ever one implementation of a given signature on a given type hierarchyIf it can determine at runtime that one (or a few) implementations are the only ones ever being calledThe first one allows code to be optimized fairly quickly, because HotSpot can discover early on that there's only one implementation. In general, if there's a single implementation of a given signature, it will get inlined pretty quickly.The second one is trickier. HotSpot tracks the actual types being called against for the various calls, and eventually can come up with a best guess at the method or methods to inline. It also can include a slow path for the rare future cases where the receiver does not match the target types, and it can deoptimize later to back down optimizations when situations change, such as when a new class is loaded into the system.So in the end, inlining is one of the most powerful optimizations. Unfortunately in JRuby (and most other dynamic language implementations on the JVM), we're making inlining difficult or impossible in the most performance-sensitive areas. I believe this is a large part of our performance woes.Consider that all method calls against any object must pass through an implementation of IRubyObject.callMethod. There's not too many callMethod implementations, and actually now there's only one implementation of each specific signature. So callMethod gets inlined pretty fast.Consider also that almost all method calls within callMethod are to very specific methods and will also be inlined quickly. So callMethod is looking pretty good so far.Now we look at the last step in callMethod...DynamicMethod.call. DynamicMethod is the top-level type for all our method objects in the system. The call method has numerous implementations, all of them different. And no one implementation stands out as the most frequently called. So we're already complicating matters for HotSpot, even though we know (based on the incoming method name) exactly the piece of code we *want* to call.Let's continue on, assuming HotSpot is smart enough to work around our half-dozen or so DynamicMethod.call implementations.DefaultMethod is the DynamicMethod implementation for interpreted Ruby code, so it calls directly into the evaluator. So at that point, DefaultMethod.call will inline the evaluator code and that looks pretty good. But there's also the JIT located in DefaultMethod. It generates a JVM bytecode version of the Ruby code and from then on DefaultMethod calls that. Now that's certainly a good thing on one hand, since we've eliminate the interpreter, but on the other hand we've essentially made it impossible for HotSpot to inline that generated code. Why? Because we generate a Java method for every JITable Ruby method. Hundreds, and eventually thousands of possible implementations. Making a decision to inline any of them into DefaultMethod.call is basically impossible. We've broken the chain.To make matters worse, we also have the set of Java-wrapping DynamicMethod implementations, *CallbackMethod (used for binding Java code to Ruby method names) and CompiledMethod (used in AOT-compiled code).The CallbackMethods all wrap another piece of generated code that implements Callback and calls the Java method in question. So we generate nice little wrappers for all the pre-existing methods we want to call, but we also make it impossible for the *CallbackMethod.call implementations to inline any of those calls. Broken chain again.CompiledMethod is slightly better in this regard, since there's a new CompiledMethod subclass for every AOT-compiled Ruby method, but we still have a single implementaiton of DynamicMethod.call that all of those subclasses share in common. To make matters worse, even if we had separate DynamicMethod.call implementations, that may actually *hurt* our ability to inline code way back in IRubyObject.callMethod, since we've now added N possible DynamicMethod.call implementations to the system. And the chain gets broken even earlier.So the bottom line here is that in order to continue improving performance, we need to do everything possible to move the call site and the call target closer together. There are a couple standard ways to do it:Hard-coded special-case code for specific situations, much like YARV does for simple ops ( , -, <, >, etc). In these cases, the compiler would check that the target implements an appropriate type to do a direct call to the operation in question. In Fixnum's case, we'd first confirm it's a RubyFixnum, and then invoke e.g. RubyFixnum.plus directly. That skips all the chain breakage, and allows the compiled code to inline RubyFixnum.plus straight into the call site.Dynamic generated method adapters that can be swapped out and that learn from previous calls to make direct invocations earlier in the chain. Basically, this would involve preparing call site caches that point at call adapters. Initially, the call adapters would be of some generic type that can use the slow path. But as more and more calls come in, more and more of the call sites would be replaced with specialized implementations that invoke the appropriate target code directly, allowing HotSpot a direct line from call site to call target.The second version is obviously the ultimate goal, and essentially would mimic what the state-of-the-art JITs do (i.e. this is how HotSpot works under the covers). The first version is easily testable with some simple hackery.I created a small patch that includes a trivial, unsafe change to the compiler to make Fixnum# , Fixnum#-, and Fixnum#< direct calls whenpossible. They're unsafe because they don't check to see if any of thoseoperations have been overridden...but of course you'd have to be a madfool to override them anyway.To demonstrate a bit of the potential performance gains, here are somenumbers for JRuby trunk and trunk patch. Note that Fixnum# , Fixnum#-, and Fixnum#< are all already STI methods, which does a lot to speed up their invocation (STI uses a table of switch values to bypass dynamic method lookup). But this simple change of compiling direct calls completely blows the STI performance out of the water, and that's without similar direct calls to the fib_ruby method itself.test/bench/bench_fib_recursive.rbJRuby trunk without patch:1.675000 0.000000 1.675000 ( 1.675000)1.244000 0.000000 1.244000 ( 1.244000)1.183000 0.000000 1.183000 ( 1.183000)1.173000 0.000000 1.173000 ( 1.173000)1.171000 0.000000 1.171000 ( 1.170000)1.178000 0.000000 1.178000 ( 1.178000)1.170000 0.000000 1.170000 ( 1.170000)1.169000 0.000000 1.169000 ( 1.169000)JRuby trunk with patch:1.133000 0.000000 1.133000 ( 1.133000)0.922000 0.000000 0.922000 ( 0.922000)0.865000 0.000000 0.865000 ( 0.865000)0.862000 0.000000 0.862000 ( 0.863000)0.859000 0.000000 0.859000 ( 0.859000)0.859000 0.000000 0.859000 ( 0.859000)0.864000 0.000000 0.864000 ( 0.863000)0.859000 0.000000 0.859000 ( 0.860000)Ruby 1.8.6:1.750000 0.010000 1.760000 ( 1.760206)1.760000 0.000000 1.760000 ( 1.764561)1.760000 0.000000 1.760000 ( 1.762009)1.750000 0.010000 1.760000 ( 1.760286)1.760000 0.000000 1.760000 ( 1.759367)1.750000 0.000000 1.750000 ( 1.761763)1.760000 0.010000 1.770000 ( 1.798113)1.760000 0.000000 1.760000 ( 1.760355)That's an improvement of over 25%, with about 20 lines of code. It would be even higher with a dynamic adapter for the fib_ruby call. And we can take this further...modify our Java integration code to do direct calls to Java types, modify compiled code to adapt to methods as they are redefined or added to the system, and so on and so forth. There's a ton of potential here.I will continue working along this path. [Less]
Posted almost 17 years ago
I jokingly called my hermit-like visit to my cabin up north as a JRuby Summit of One based on Ola's thought of my hosting a JRuby Summit up at my cabin.  I wanted to get away after JRuby's 1.0 release and do some more deep-diving into our internals.  ... [More] I am nearing the end of this adventure and I decided to capture some of the thoughts and ideas I grappled with.  Many of these ideas have been mulled over by Charlie and I in the past and hopefully this will add value to our future improvements.Interpreter Internal Ideas (I^3) Brain Dump1. Combine Frame and DynamicScopeOne issue here is that Frames do not cross threads while DynamicScopes can.  Here is a simple snippet showing this:def foo  a = 1  Thread.new { a }endThis is probably not a show-stopper, since frame arguments are basically part of dynamicscopes already.  In fact we do an extra arraycopy to fill in frame arguments into dynamicscope already.  This even lead to a bug where we would update the dynamicscope version but not the frame version of the argument; Then super would not have the updated parameter value (Ola added some duct tape to fix this).A Frame may also have more than expected arguments.  This is not a show-stopper either since a new Frame/Dynamic scope for a call which has the wrong number of arguments will not progress to the point of executing (therefore not get populated with local variables at all).  It does require us to wait to allocate the var size until EvaluationState.setupArgs.Here is a high-level breakdown of variables:TC.vars: [combined1][combined2]...[combinedn]Combined1: [bounded_extras][frame localvars]Bounded_Extras: $_, $~ (others?)Frame LocalVars: farg1, farg2, ... fargn, lvarm, lvarm 1, lvarm oWe need a way of referring to this stored structure and since multiple threads need to access various vars (see example above), then we need to deal with that (e.g. how do we reference this var storage).  2. Pre-allocation and reuseThis is mostly a patch Charlie created and it creates less object churn.   An ivory tower person would say this is not a good strategy since this is what the GC and VM should be doing for us (we are optimizing something that is under the purview of the JVM).  A pragmatic person would measure things under different versions of the VM and measure things and use it if it pays off.  I vote for pragmatism.  Pre-allocation is also orthogonal to anything else since it is just an user-managed object pool.The extra complexity it creates does not seem like a big deal other than understanding what objects hold references to Frames and creating an  invalidation mechanism.  The first swipe at a patch for this seems to have this issue.3. Early allocation of artificial frames (e.g. our Frame object)MRI and JRuby both have a variety of structures in ThreadContext and also a number of parameter lists in the call-chain (e.g. the native environments frame).  The parms in the call chain are really the way to go since it is not emulating an artificial frame it is using the native one.  OTOH, we already have an artificial frame and we use a variety of structures and getters/setters to allocate a frame much too late in the call chain.  So what did I just say:We should try and eliminate frameIf we have a frame we should be allocating it much earlierI think focusing on 2 is a reasonable step at this point because it will remove quite a bit of overhead (extra structures getters/setters) and more  importantly simplify the internals.  At some point in the future we may be able to accomplish 1 (though eval and super may make this tough to accomplish).Another interesting property of Frames in Ruby is that method invocations only need need the current Frame and one previous Frame.  The previous Frame is only needed for a few things like: zsuper, block_given?/iterator?, method_missing.  Frames are also needed for dumps (compiler does not need these).4. Simplification of CRefcref is the live instance of a module which is rooted to its lexical represention.  Crefs get create at any Class, SClass, or Module.  They are stored in a spaghetti structure which points a parent module (lexically versus containment -- note: these are not always the same -- see Module.nesting*).  Currently, we set/get from a structure in ThreadContext.  We currently do it more often than only at parse time.  This is wasteful.  If we tuck it into a lexical structure (like StaticContext) then we can access it an eliminate a portion of the sets in TC current cref structure.  StaticContext is not quite perfect since more StaticContexts exist than classes (crefs change) so you may be in a position over time where you ask for a cref in a static structure and then have to dive down until you find one which has the cref (I mitigate this by backfilling all intermediate staticcontext's with the proper value so later requests do not need this dive).A second portion of this work is that modules will just have a parent which is a RubyModule.  Currently, it is a cref.  It only needs to be a single RubyModule since this attribute should really only need to represent live containment versus lexical containment (see Module.nesting*).  * Here is an example of lexical scoping of CRefs:class A  def boo    Module.nesting  endenda = A.newp a.booclass C  class ::A    def gar      Module.nesting    end  endendp a.gar [Less]
Posted almost 17 years ago by Ola Bini
It has been two long days; not because I've been going to sessions all day long, but because I've reworked my presentations quite heavily. But now both the BOF and the TS are finished, and I think they went well. I had to keep the level to Ruby ... [More] , JRuby and Rails introductionary material, though, since most developers here didn't seem to know what is possible with these technologies.But it's been great; I've gotten good feedback and had some really interesting conversations with lots of people.We have been doing the town each night, and I've found that I like Barcelona very much. Except for the food: this country doesn't seem to be good for vegetarians at all. Very annoying. I'm going for beer and wine instead of food the rest of the week. =)One day left, though, and it's bound to be nice. Me and Martin are both on a developers panel about the state of programming languages in 2020; I have no idea what to say, and I'm thinking about just ad-libbing it. I know my own position in these questions fairly well, and the current Yegge-debate have made my opinions even more explicit.But now it's time to see the town again. [Less]