|
Posted
almost 18 years
ago
by
Ola Bini
I have been involved in several discussions about programming languages lately and people have assumed that since I spend lots of time in the Ruby world and with dynamic languages in general I don't like static typing. Many people in the dynamic
... [More]
language communities definitely expresses opinions that sound like they dislike static typing quite a lot.I'm trying to not sound defensive here, but I feel the need to clarify my position on the whole discussion. Partly because I think that people are being extremely dogmatic and short-sighted by having an attitude like that.Most of my time I spend coding in Java and in Ruby. My personal preference are to languages such as Common Lisp and Io, but there is no real chance to use them in my day-to-day work. Ruby neatly fits the purpose of a dynamic language that is close to Lisp for my taste. And I'm involved in JRuby because I believe that there is great worth in the Java platform, but also that many Java programmers would benefit from staying less in the Java language.I have done my time with Haskell, ML, Scala and several other quite statically typed languages. In general, the fact that I don't speak that much about those languages is that I have less exposure to them in my day-to-day life.But this is the thing. I don't dislike static typing. Absolutely not. It's extremely useful. In many circumstances it gives me things that really can't have in a dynamic language.Interesting thought: Smalltalk is generally called a dynamic language with very late binding. There are no static type tags and no type inference happening. The only type checking that happens will happen at runtime. In this regard, Smalltalk is exactly like Ruby. The main difference is that when you're working with Smalltalk, it is _always_ runtime. Because of the image based system, the type checking actually happens when you do the programming. There is no real difference between coding time and runtime. Interestingly, this means that Smalltalk tools and environments have most of the same features as a static programming language, while still being dynamic. So a runtime based image system in a dynamic, late bound programming language will actually give you many of the benefits of static typing at compile time.So the main take away is that I really believe that static typing is extremely important. It's very, very useful, but not in all circumstances. The fact that we reach for a statically typed programming language by default is something we really need to work with though, because it's not always the right choice. I'm going to say this in even stronger words. In most cases a statically typed language is a premature optimization that gets extremely much in the way of productivity. That doesn't mean you shouldn't choose a statically typed language when it's the best solution for the problem. But this should be a conscious choice and not by fiat, just because Java is one of the dominant languages right now. And if you need a statically typed language, make sure that you choose one that doesn't revel in unnecessary type tags. (Java, C#, C , I'm looking at you.) My current choice for a static language is Scala - it strikes a good balance in most cases.A statically typed language with type inference will give you some of the same benefits as a good dynamic language, but definitely not all of them. In particular, you get different benefits and a larger degree of flexibility from a dynamic language that can't be achieved in a static language. Neal Ford and others have been talking about the distinction between dynamic and static typing as being incorrect. The real question is between essence and ceremony. Java is a ceremonious language because it needs you to do several dances to the rain gods to declare even the simplest form of method. In an essential language you will say what you need to say, but nothing else. This is one of the reasons dynamic languages and type inferenced static languages sometimes look quite alike - it's the absence of ceremony that people react to. That doesn't mean any essential language can be replaced by another. And with regads to ceremony - don't use a ceremonious language at all. Please. There is no reason and there are many alternatives that are better.My three level architecture of layers should probably be updated to say that the stable layer should be an essential, statically typed language. The soft/dynamic layer should almost always be a strongly typed dynamic essential language, and the DSL layers stays the same.Basically, I believe that you should be extremely pragmatic with regards to both static and dynamic typing. They are tools that solve different problems. But out industry today have a tendency to be very dogmatic about these issues, and that's the real danger I think. I'm happy to see language-oriented programming and polyglot programming get more traction, because they improve a programmers pragmatic sensibilities. [Less]
|
|
Posted
almost 18 years
ago
by
Ola Bini
So I've finally made the time to extract JvYAMLb from JRuby. That means that JvYAML is mildly deprecated. Of course, since JvYAMLb only uses ByteLists it might not be the best solution for everyone.If you're interested in downloading the 0.1 release, you can do it at the Google Code download site http://code.google.com/p/jvyamlb/downloads/list.
|
|
Posted
almost 18 years
ago
by
Ola Bini
Wow. JRuby 1.1 is out! Go get! (And incidentally, wouldn't JRuby be the perfect way for Google to support Ruby in Google Apps Engine?)I'll be doing a talk tonight at .NETAkademien about Ruby and Rails. It's mostly an introduction. You can find more
... [More]
info here.On Thursday I'll speak about JRuby at Developer Summit in Stockholm. I'll be in the Dynamic Languages track and you can read the abstract and more information here.And finally, next Monday I'll talk about JRuby at the Stockholm Ruby User Group. You can find out more information about that event here.Overall, it feels like the conference season is starting up again. I'll be presenting at JavaOne in a month too.I've been extremely quiet here lately. It's been a bit slow in JRuby land, actually, and my focus has been elsewhere. Hopefully I'll be able to start posting some really cool stuff soon - I have something up my sleeve that should be possible to talk about soon. [Less]
|
|
Posted
almost 18 years
ago
by
Charles Oliver Nutter
I'm very tired. I've been in the EU since last week Wednesday, first at Euruko in Prague, then getting slides ready for Scotland on Rails, and finally preparing a release announcement with Tom today. It's been a very long 11 days. So here's some
... [More]
quick hits for you.EuRuKo FTW - It seems like EuRuKo was the conference to be at. We had three choices over the past two weeks: EuRuKo, last weekend in Prague; RubyFools A and B, in Copenhagen and Denmark earlier this week; and Scotland on Rails in Edinburgh yesterday and today. We made earlier commitments to EuRuKo and Scotland and opted not to kill ourselves by making extra trips to Copenhagen and Oslo. But from the sounds of it, we did pretty well. EuRuKo had better than 300 people, and it was a great conference...felt very much like my first RubyConf in 2004. Scotland had a hundred-something people and I heard from a few other folks RubyFools was about the same. And although my opinion of Prague was only slightly improved this trip, EuRuKo turned out to be an excellent conference.Beer is dangerously cheap in Prague - I'm not sure why I didn't learn my lesson the first time.Scotland is beautiful - Maybe I like rainy, green places, but I thought Edinburgh was a beautiful city and Scotland was a beautiful country. I wish I'd had time to go hiking on the crags or take a day trip to Loch Ness. I'll definitely be back.Rails developers like more than Rails - We went out on a limb with our 90-minute Scotland on Rails talk and also presented NetBeans Ruby/Rails support, performance/threading demos, Ruby-Processing graphics eye-candy, the classic IRB/Swing demonstration, and a healthy series of slides debunking most of the great myths about Java. And the attendees loved it. Sure, we had lots of people really excited about the GlassFish Gem and WAR-file deployment, but we had at least as many excited about using Swing in Ruby, distributing all-in-one application bundles with JRuby, compiling code for fun and profit (and a perhaps disturbing number of people wanting compilation solely for obfuscation purposes), and general NetBeans Ruby/Rails features. It was a far more receptive audience than I'd expected, and the talk felt just great.I need to come back to the UK soon - I like it here, and there's tons of people interested in JRuby. Gotta find a good excuse to come back soon.And of course, we got JRuby 1.1 out today (download), after a good 9-10 months since JRuby 1.0. Happy times! Now, back to work...JRuby 1.1.1 soon, additional maintenance releases as needed, and super-happy-ultra JRuby 1.2/2.0 work coming up.I sleep now. [Less]
|
|
Posted
almost 18 years
ago
by
Charles Oliver Nutter
One of the big reasons people have been interested in JRuby lately is because it is currently the only Ruby implementation that can both host multiple instances in a single process and run threads *actually* in parallel. Ruby 1.8 is entirely
... [More]
green-threaded. Rubinius has support for multiple parallel in-process VMs, but each one is still green-threaded. Ruby 1.9 uses native threads, but they're almost always blocking on a giant lock. IronRuby uses native CLR threads, but can't host multiple instances in a single process. JRuby FTW.Of course true parallel execution is not without its pitfalls, and we stumbled across one this week. No, it wasn't a lock contention issue or data corruption or anything you're thinking of. It was smaller, sneakier, and much lower-level.Have a look at this code:public class Trouble { public static int i = 1; public static int firedCount = 0; public static int totalSize = 1000000000; public static int maxLoops = 5; public static void main(String[] args) { final int threadCount = Integer.parseInt(args[0]); Thread threads[] = new Thread[threadCount]; try { for (int loops = maxLoops; loops > 0; loops--) { firedCount = 0; long start = System.currentTimeMillis(); for (int j = 0; j < threadCount; j ) threads[j] = new Thread() { public void run() { int total = totalSize / threadCount; for (int k = 0; k < total; k ) { if (((i = 1) & 0xFF) == 0) doNothing(); } } }; for (int j = 0; j < threadCount; j ) { threads[j].start(); } for (int j = 0; j < threadCount; j ) { threads[j].join(); } System.out.println("time: " (System.currentTimeMillis() - start)); System.out.println("fired: " firedCount); } } catch (Exception e) { e.printStackTrace(); System.exit(1); } } public static void doNothing() { firedCount ; }}It looks pretty straightforward, right? Given a number on the command line, it fires up that many threads, divides the overall loop count by the number of threads, and starts them all running. Each thread increments the same static field 'i', and if ever i % 256 == 0, they call doNothing which increments a hit counter. Simple!This is obviously a pretty dumb microbenchmark, but it mirrors a similar piece of code in JRuby. Because Ruby 1.8 supports various "unsafe" threading operations, we must also support them. But since we can't kill a target thread or raise an exception arbitrarily in a target thread, we must periodically checkpoint. Every thread checks whether one of those events has fired at various points during execution. The full list is describe on the RubySpec wiki page on Ruby threading, but the bit we're interested in is the check that happens every 256 method calls. In JRuby, this was implemented with a static counter, and every time a call was made the counter was incremented and checked.So because JRuby has the ability to *actually* run in parallel, a group at Sun has been working on some microbenchmarks to test our threaded scalability. The benchmark is fairly simple: given a array of strings, reverse each string and replace it in the array. When running with n threads, divide the array into n segments to parallelize the work. We'd naturally expect that performance would improve as threads were added up to the number of cores in the system. But we discovered a problem we never expected to see: threaded scaling completely tanked.A Surprising EffectHere's the result of running the Java benchmark on Java 5 with one and then two threads:~/NetBeansProjects/jruby ➔ java -server Trouble 1time: 3924fired: 3906250time: 3945fired: 3906250time: 1841fired: 3906250time: 1882fired: 3906250time: 1896fired: 3906250~/NetBeansProjects/jruby ➔ java -server Trouble 2time: 3243fired: 4090645time: 3245fired: 4100505time: 1173fired: 3906049time: 1233fired: 3906188time: 1173fired: 3906134This is on my Core Duo MacBook Pro. With one thread, overall CPU usage is at around 60-65%. Spinning up the second thread consumes the remaining 35-40%, and so we see an improvement in the overall execution time. Yay parallel execution.But now have a look at the Java 6 numbers on the same machine:~/NetBeansProjects/jruby ➔ java -server Trouble 1time: 1772fired: 3906250time: 1973fired: 3906250time: 2748fired: 3906250time: 2114fired: 3906250time: 2294fired: 3906250~/NetBeansProjects/jruby ➔ java -server Trouble 2time: 3402fired: 3848648time: 3805fired: 3885471time: 4145fired: 3866850time: 4140fired: 3839130time: 3658fired: 3880202Woah, what the hell happened here? Not only is overall performance worse, the two-thread run is significantly slower than the single thread. And we saw the exact same effect in JRuby: two threads running a given benchmark performed markedly worse than a single thread...exactly the opposite effect we'd like to see.Caching 101Source: CPU Cache article on WikipediaAccessing main memory during execution is slow...so slow that all modern processors have multiple levels of cache in which to cache memory locations that are used repeatedly. There's cache on the processor itself, cache in fast memory chips located near the processor, and so on. The cache, in whatever form, is a smaller, faster memory store. Within the cache, data is organized into cache lines, segments of data that map to specific locations in main memory. So if your processor uses cache lines of 16 bytes, the cache will hold some number of 16-byte entries for the most recently or most frequently-used data from memory.Writes to memory also benefit from caching. Whether a cache is write-through or not, most writes do not immediately go to main memory for the same performance reasons: main memory is slow. So if you have a single processor accessing a small range of memory, rapidly reading and writing data, it may operate almost entirely from the cache. To help this process along, most compilers will try to organize memory locations likely to be accessed temporally near each other such that they'll be loaded into nearby memory locations; the more you can jam into each cache line, the more likely you'll make good use of available cache memory.Of course at some point, the cache becomes stale, and writes must flushed and reads must be refreshed. This can happen (for example) when there's a cache miss, and the program accesses a memory location not already cached. In this case, one of the other cache entries must be evicted to make room for the new one. It will also happen naturally as the program progresses or as new programs and threads are scheduled to execute...eventually the cache must be flushed simply to ensure that main memory has been updated and all lines of execution see the change. But there's another point at which the cache or (more likely) an individual cache line might be flushedCache Line Ping-PongIn a multiprocessor/multicore machine, you will usually have two or more threads actually running in parallel. Depending on the architecture, each core usually has its own cache, allowing the two threads to run independently without influencing each others' work. Of course things started to get interesting when you have multiple threads accessing the *same* memory locations, as we have in this case.This brings us to the reason for Java 6's poor performance on the benchmark above. John Rose helped me out by disassembling the resulting native code for this benchmark on Java 5 and Java 6. It turned out that while Java 5 located the two static fields in different quadwords (16 bytes), Java 6 located both fields in the same quadword. This optimization allows better cache utilization...since the two fields are immediately next to each other in memory, they're almost certain to share a cache line. But it also has a different effect.Cache line ping-pong happens when two threads executing against different caches need to read and write the same memory locations. Because the two threads need to see each others' changes, the cache line ends up bouncing back and forth between the two caches, generally causing a flush or refresh from main memory in the process.Main memory is slow, remember? Because of the cache line ping-pong, adding threads for this benchmark actually caused things to slow down. Every new thread increased the amount of back-and forth between caches and main memory up to the number of cores...at which point, performance remained about constant for additional threads (because the amount of ping-pong was no worse).It's interesting to note that single-thread performance was also considerably worse on Java 6. I don't yet have a good explanation for this.The FixFor the Java benchmark above, given what we now know about cache lines and such, a simple fix that Vladimir Sizikov found was to move one of the two statics to a different class. Java 6 only aligned within a single quadword the two fields when on the same class; moving one elsewhere presumably results in their living on different cache lines and the amount of ping-ponging is reduced.In JRuby, where we had only a single counter but a lot of other stuff happening around it, the answer was much clearer: we split the single global counter into several per-thread counters and the effect was entirely erased. So the string-reversal benchmark started to scale nearly-linearly with additional threads, while Ruby 1.8, Ruby 1.9, and Rubinius showed a slight performance degradation as threads were added. JRuby FTW.The MoralI suppose there's a few things we can learn from this:Don't use statics - Gilad Bracha probably said it best in his article "Cutting out Static":Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise.Touché, Gilad.Don't share state unless you absolutely have to - This is perhaps a lesson we all believe we know, but until you're bitten by something sneaky like this you may not realize the what bad habits you have.One man's optimization is another man's deoptimization - Java 6 is significantly faster than Java 5, presumably because of optimizations like this. But every optimization has a tradeoff, and this one caught me by surprise. [Less]
|
|
Posted
almost 18 years
ago
by
Charles Oliver Nutter
Late last week I wired up a quick proof-of-concept C back-end for Duby, to show that it is possible. Duby's design represents all types as symbols, which allows the type representation to vary across platforms. So in fib(), the fixnum type is
... [More]
represented as ":fixnum" throughout, and it can be mapped to whatever the backend decides fixnums should be. This means that, in general, code you write in Duby will type infer (for some definition of infer) and produce an AST suitable for compilation to any backend or type system. And with the appropriate plugins, like the "math" typer from my previous Duby post, you can do cool things like represent integer math as either primitive math operations or method calls against object types. But the source code remains the same.I also wanted to blunt some criticism I received during the first Duby prototype. It represented types as symbols, but symbols nearly identical to their eventual Java types. In order for Duby to be generally useful (in, for example, Rubinius) platform-agnostic symbolic typing is a must.So here's the code for my little C backend in its current state. It's pretty crude at the moment, but you can get the general idea. There's obviously work to do: it doesn't correctly insert "return" keywords, doesn't insert end-of-line semicolons everywhere appropriate (and often inserts them in totally inappropriate place), and in general doesn't produce entirely valid C code, but those are mostly minor details; the important takeaway here is that the typing and general Duby AST representation can easily be compiled this way.Ok then! Given a simple script:def fib(n) {n => :fixnum} if n < 2 n else fib(n - 1) fib(n - 2) endenddef go fib(35)endWe can run the C "compiler" thus:jruby -rduby/plugin/math lib/ruby/site_ruby/1.8/duby/c_compiler.rb fib.rbAnd we get a fib.rb.c file as a result:int fib(int n) {;if (n < 2) {n} else {fib(n - 1) fib(n - 2)};};int go() {fib(35)};As I said, it's not the prettiest thing in the world, but I think it demonstrates that Duby should easily be able to target pretty much any backend platform you like. Fun stuff! [Less]
|
|
Posted
almost 18 years
ago
by
Charles Oliver Nutter
Akira Tanaka just did a lightning talk on IO.copy_stream, a new method added to Ruby 1.9 last night that does a fast stream transfer all in C code, avoiding the intermediate strings. It's a good idea, one I hope takes hold for other IO operations
... [More]
that could use some native lovin.It's so good, in fact, I took five minutes out of my day to produce a crude implementation of it in JRuby (and note, I have not looked at Akira's implementation, so I'm sure it handles cases my version does not):@JRubyMethod(name = "copy_stream", meta = true, compat = RUBY1_9)public static IRubyObject copy_stream( IRubyObject recv, IRubyObject stream1, IRubyObject stream2) throws IOException { RubyIO io1 = (RubyIO)stream1; RubyIO io2 = (RubyIO)stream2; ChannelDescriptor d1 = io1.openFile.getMainStream().getDescriptor(); if (!d1.isSeekable()) { throw recv.getRuntime().newTypeError("only supports file-to-file copy"); } ChannelDescriptor d2 = io2.openFile.getMainStream().getDescriptor(); if (!d2.isSeekable()) { throw recv.getRuntime().newTypeError("only supports file-to-file copy"); } FileChannel f1 = (FileChannel)d1.getChannel(); FileChannel f2 = (FileChannel)d2.getChannel(); long size = f1.size(); f1.transferTo(f2.position(), size, f2); return recv.getRuntime().newFixnum(size);}It's primitive, but it does the trick. NIO streams unfortunately only provide this nice "transferTo" method for FileChannel, though there's other ways you can do fast copies from stream to stream using buffers. But this was a nice simple way to wire it up for now. So, let's see numbers.require 'benchmark'Benchmark.bmbm do |bm| bm.report("Control") do 10000.times do File.open(__FILE__) do |file| File.open(__FILE__ ".tmp", 'w') do |file2| end end end end bm.report("10k file copies") do 10000.times do File.open(__FILE__) do |file| File.open(__FILE__ ".tmp", 'w') do |file2| IO.copy_stream(file, file2) end end end endendFirst JRuby numbers (excluding rehearsal): user system total realControl 2.191000 0.000000 2.191000 ( 2.191000)10k file copies 5.190000 0.000000 5.190000 ( 5.190000)Not bad for 10k copies of a file...it works out to about 3s above and beyond the cost of running the blocks and opening/closing the files. How about Ruby 1.9? user system total realControl 0.360000 0.550000 0.910000 ( 0.903211)10k file copies 0.850000 2.450000 3.300000 ( 5.363433)Ruby 1.9's numbers come out around 4.4s above and beyond the control logic. Obviously we need to look at improving JRuby's numbers for the blocks and file opening, but it's good to know that Java's IO actually *can* be nice and fast when you need it to be.This feature will only be available as part of JRuby's 1.9 support, which is still in development. [Less]
|
|
Posted
almost 18 years
ago
by
Ola Bini
I have been trying to figure out what characterizes some of the best programmers I know, or know about. It's a frustrating endeavor of course, since most of these people are very different from each other, and have different experiences and ways to
... [More]
think and learn about stuff. Not to mention the fact that programmers tend to be highly individualistic.But I think that I'm finally zeroing in on something that is general enough but also specific enough to categorize most of these people, and the general mind needed to be a really good programmer. In this blog post I'll call that "meta-level thinking", and I'll explain more about what I mean as we go along.When people try to become better programmers there are generally a few different kinds of advices you can get. The ones that seem prevalent right now is (among others):Learn - and understand - LispMore generally, learn a new language that is sufficiently different from your current knowledgeWork with domain specific languagesUnderstand metaprogrammingRead up on the available data structures (Knuth anyone)?Implement a compiler and runtime system for a languageOf course, these are only a few examples, and the categories are a bit fuzzy. These items are based on most of my experiences so they tend to be a bit language heavy. But I believe that what you practice by choosing any of these routes is an ability to abstract your thinking. In short, I believe that being able to abstract and understand what goes on in a programming language is one way to become more proficient in that language, but not only that - by changing your thinking to see this part of your environment you generally end up programming differently in all languages.This is why Lisp has a tendency to change the way people write Java code. Lisp teaches you metaprogramming and DSL's but they don't really do it in a way that need words. DSLs and metaprogramming are just part of Lisp. It's so much a part of the structure that you don't see it. But when you turn to Java you'll need to start working with these abstractions on a different level. Ruby programmers embrace metaprogramming and this changes the way these Ruby programmers think about things.I'm really happy to see metaprogramming and DSLs getting more and more focus, because I really believe that understanding them is a really good way to make programmers better. Of course, you can get the same effect by writing a copmiler and runtime system - as Steve Yegge proposes. But I disagree that you really need that experience. There are other ways you can get the same way of looking at the world.I call this meta-level thinking. I think it's mostly a learned ability, but also that there is an aptitude component. Some of the best programmers I've met just have a mind that fits very well. It's interesting to note that this kind of meta-level thinking is not an ability that is only applicable to programming. In fact, that's probably just a matter of genetic luck that the same ability works very well for programming as for many other things. I think that there is a connection between certain abilities and a capacity for meta-level thinking too. Like music - it's interesting to see how many programmers that have artistic leanings, and specifically towards music. It would be interesting to see some statistics about this.In conclusion, this is my somewhat fuzzy view about what is one of the most important abilities that contributes to create brilliant programmers. If you have any interesting ideas about other ways you can reliably practice this ability, please improve on my list. [Less]
|
|
Posted
almost 18 years
ago
by
Charles Oliver Nutter
It's been...oh my, almost two weeks since I broke the news about Duby. Since then I attended PyCon and we managed to get JRuby 1.1 RC3 out the door, which is looking like it will become JRuby 1.1 final. But I've still been working on Duby in my spare
... [More]
time. It's kinda nice to have a different "spare time" project than JRuby for a while.After my previous post, I continued to carry the compiler forward. I managed to get it compiling the following syntactic structures:until and while loopsstatic method definitions and calls (using "def self.foo" syntax)array types (which combined with static calls allowed creating a "main" method that worked!)instance variables (using @foo to mean the "foo" field on a Java type)type resolution from any Java typeimports (with "import foo.bar.Baz" syntax)packages (using "module Foo" syntax, though that will probably change)type inference across methods on the same type, including smarts for instance vs staticAll very exciting stuff, and starting to get very close to a language that would be usable for simple algorithms. Except there was just one problem: every new feature I added made the codebase more difficult to understand.ReloadedI had just started to explore the next steps when I realized the codebase was becoming too hard to follow. After chasing around a few inference bugs I decided it was time to take a step back. I also hoped all along to eliminate the hard requirement to declare a return type, since that at least should be apparent from inspecting the method body...but it wouldn't be possible with the original design.The initial Duby compiler was entirely made up of decorations on the existing Ruby AST. A CallNode, for example, was decorated with methods called "compile", "declared_type", "signature" and so on that were called in various sequences to first infer types and then produce bytecode. There were a few problems with this approach that became apparent as work continued:The hypothetical Duby AST structure did not map 1:1 to the Ruby AST structure; it was richer, with type information, method signatures, and overloaded/hooked syntax. To support this in the existing AST, many layers of complexity had to be added to the compilation process: "is this call being used as a call or as a type declaration?"The decorator methods were inextricably entwined with Java/JVM-specific details, be they type names, bytecodes, or just obviously Java-centric syntax. This not only made it more difficult to evolve the language, but made it impossible for the language to live anywhere but on JVM and be compiled by anything but JRuby.My grand plans for the language were quickly exceeding what would be possible by simply decorating the AST.The third bullet may spark some interest, so I'll explain very briefly (since it's still just forming in my head). As I thought more about how to map Ruby to the JVM, I realized that very few algorithms, very little syntax couldn't be made to map to *something* fast, static-typed, and conceptually the same. Mix-ins, for example, could easily be implemented like C# 3.0's extension methods. Blocks could be implemented behind the scenes as anonymous types of particular signatures. Even operator overloading could easily be mapped to appropriate methods on Numeric types, Collection types, and many others. The key to all this was having type inferencing and compiler layers that were not just flexible, but infinitely pluggable.The Dream of DubyThe dream, as I see it, would be to use a Ruby-like syntax to wire up any arbitrary types using what looks generally like idiomatic Ruby code. For example, our friend fib(). The same fib() method I showed before, executing against primitive integer values, could execute against boxed Integer objects or BigInteger objects just by specifying the right plugin. So if within the context of a file, I declare that "fixnum" types are to be represented as BigInteger objects, and math operations against them are to call appropriate methods on BigInteger, so be it...that's what comes out the other side.The term for this is, I believe, "specialization". In the case above, it's explicit specialization on a case-by-case basis. For many cases, this is all you need...you know the types you're dealing with ahead of time and can comfortably declare them or specify type mappings during compilation. But the more interesting side of this comes from general-purpose specialization in the form of parametric polymorphism...or in terms you might be more familiar with: generics.I am certainly stepping beyond my current education and understanding here, but the pieces are starting to fit together for me. I and others like me have long been envious of languages like Haskell which infer types so incredibly well. And living between the Ruby and Java worlds, I've often felt like there had to be some middle ground that would satisfy both ends of the spectrum: a static, verified type system flexible enough to consume and specialize against any incoming types just by loading a plugin or recompiling, combined with the cleanest, most expressive (while still being one of the most reable) dynamic language syntaxes around. And so that's where I'd like Duby to fit.The New VersionSo where does it stand now?Things have been moving fast. With JRuby 1.1 RC3 out of the way, I've taken some time to go back and rework the Duby pipeline.Now, the only decoration on the Ruby AST is in the form of transformation logic to produce a richer Duby syntax tree. Literals are the same. The three call types have been unified into two: Call and FunctionalCall (against "self"). The various types of code bodies have been unified into a single Body construct. Method definition is represented through MethodDefinition and StaticMethodDefinition, both of which aggregate a "signature" element to represent declared or inferred signature information. The several looping constructs (excluding for loops, which are block-based iterations) have been unified into Loop. And so on. Not all the syntax supported by the Duby prototype has been represented in transformation, but it's not far off. And I'm taking a much more measured approach now.The new AST structure affords a much more realistic typing and compilation pipeline. Each of the Duby AST nodes defines a single method "infer" which, when passed a Typer, will walk its children and infer as many types as it is able. Each child walks its own children, and so on, unifying and generalizing types as it goes (though the unification and generalization is stubbed out now to only allow exact matches...this will eventually be the responsibility of the back-end to handle). Simple code that calls fully-resolved target methods and has no unknown branches may completely resolve during this first pass.In more complicated cases, each node that can't make an accurate determination about inferred type registers itself with the Typer in its "deferred" list. Once the initial inference cycle has run, all nodes in the AST will have either successfully inferred their result type or registered with the deferred list. At this point, you can either continue to pass ASTs through the Typer, or you can begin the resolution phase.To start the resolution phase, the "resolve" method on the Typer is called, which attempts to iteratively resolve all deferred nodes in the AST. This resolution process in theory will loop until either the deferred list has been drained of all nodes (which will presumably then be 100% resolved), or until two successive resolution cycles fail to alter the list (perhaps because more information is needed or there are circular references for which the user must add hints). In general, this means that the deepest unresolved child nodes will fall out first. For example, if you have a method "foo" that calls a method "bar" before "bar" has been defined in the source.def foo barenddef bar 1endDuring the first inference cycle, bar will completely resolve but foo will be deferred. During the resolution phase, foo will now see that bar has been defined and resolved, and will itself resolve. Both return :fixnum (though the decision of what type ":fixnum" resolves to will be left to the compiler backend for a given system).Back to fib()Here's our friend fib(), which serves as a nice simple example:def fib(n) {n => :fixnum} if n < 2 n else fib(n - 1) fib(n - 2) endendThe fib() method is actually fairly interesting here because it recurses. If this were a simple recursion, it would be impossible to determine what actual type fib returns without an explicit declaration, since no other information is available, and this would produce an error of the following variety:~/NetBeansProjects/jruby ➔ cat recurse.rbdef recurse recurseend~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb recurse.rb Could not infer typing for the following nodes: FunctionalCall(recurse) (child of MethodDefinition(recurse)) MethodDefinition(recurse) (child of Script) Script (child of )AST:Script MethodDefinition(recurse) {:return=>Type(notype)} Arguments FunctionalCall(recurse)Here we see a simple "recurse" method that just calls itself, and as you'd expect type inference fails. Because the return value of "recurse" depends on knowing the return value of "recurse", resolution fails.However in the case of fib(), we don't have a simple recursion, we have a conditional recursion. The default behavior for the Duby "Simple" typer is to assume that if one branch of an If can successfully resolve, that's the type to use (temporarily) as the value of the If (while still marking the if as unresolved, and unifying the two bodies later). And since the root of the fib() method only contains an If, it can successfully resolve. Let's try it:~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rbCould not infer typing for the following nodes: Call(<) (child of Condition) Condition (child of If) Call(-) (child of FunctionalCall(fib)) FunctionalCall(fib) (child of Call( )) Call(-) (child of FunctionalCall(fib)) FunctionalCall(fib) (child of Call( )) Call( ) (child of If)...Ouch, what happened here? Actually it's pretty easy to understand...the calls to "<", "-", and " " were unknown to the Typer, and so they could not be resolved. As a result, the If Condition could not resolve, nor could the body of the Else statement. This is not necessarily a fatal state, merely an incomplete one. The "resolve" method on Typer can be called with "true" to force an error to be raised, or with no arguments to just "do the best it can do". In this case, using the simple call to the typer, it raises and displays the error, but there's no reason that more information couldn't be added to the system to allow a subsequent resolution to proceed.Pluggable InferenceThis is where having a pluggable engine starts to come in handy. Though the mechanism is currently pretty crude, there's already a basic ability to specify a plugin for method resolution. In order to enlist in method resolution, a plugin must define a "method_type" method that accepts a parent Typer, a target type, a method name, and a list of parameter types. If at any time method resolution fails in the Simple Typer, the plugin Typers will be called in turn. So in this case, I created a simple "Math" typer that's aware of a few simple operations with LHS and RHS of :fixnum. Let's try it:~/NetBeansProjects/jruby ➔ jruby -rcompiler/duby/plugin/math lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rbAST:Script MethodDefinition(fib) {:return=>Type(fixnum), :n=>Type(fixnum)} Arguments RequiredArgument(n) Body Noop If Condition Call(<) Local(name = n, scope = MethodDefinition(fib)) Fixnum(2) Local(name = n, scope = MethodDefinition(fib)) Call( ) FunctionalCall(fib) Call(-) Local(name = n, scope = MethodDefinition(fib)) Fixnum(1) FunctionalCall(fib) Call(-) Local(name = n, scope = MethodDefinition(fib)) Fixnum(2)Notice now that the return type of fib() has been correctly inferred to be :fixnum. Huzzah! We are successful!DebuggingAlong with work to make the system more pluggable and the code easier to follow, I've also been trying to provide useful debugging output. Man, I think making debugging output useful and readable is harder than writing a type inference engine in the first place. I must have spent a good hour just tweaking output so it didn't look totally heinous. And it's still not great, but it's definitely usable. Here, for your edification, is the debugging output from type inference on fib.rb:* [Simple] Learned local type under MethodDefinition(fib) : n = Type(fixnum)* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)* [AST] [Fixnum] resolved!* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)* [AST] [Call] resolved!* [AST] [Condition] resolved!* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)* [AST] [Fixnum] resolved!* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)* [AST] [Call] resolved!* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for "fib" Type(fixnum) on Type(script) not found* [Simple] Deferring inference for FunctionalCall(fib)* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)* [AST] [Fixnum] resolved!* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)* [AST] [Call] resolved!* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for "fib" Type(fixnum) on Type(script) not found* [Simple] Deferring inference for FunctionalCall(fib)* [Simple] Method type for " " on not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for " " on not found* [Simple] Deferring inference for Call( )* [Simple] Deferring inference for If* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)* [AST] [FunctionalCall] resolved!* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)* [AST] [FunctionalCall] resolved!* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)* [Simple] Method type for " " Type(fixnum) on Type(fixnum) not found.* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>* [Math] Method type for " " Type(fixnum) on Type(fixnum) = Type(fixnum)* [AST] [Call] resolved!* [Simple] [Cycle 0]: Inferred type for Call( ): Type(fixnum)* [AST] [If] resolved!* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)* [Simple] Inference cycle 0 resolved all types, exitingWhat's NextI should clarify a few things before getting back to work:This codebase is mostly separate from, but heavily advised by the original Duby prototype. I learned a lot from that code, and it's still more functional from a compilation standpoint, but it's not really something I can evolve. The new codebase will probably be at the same level of functionality with a week or so.Largely the more measured pace of this new codebase is because of two key goals.I'd like to move the system ever toward full, general type inference when possible. That includes inferring method parameters as well as return types. That also means there will have to be a much more comprehensive type-inference engine than the current "simple" engine, but nothing about the current system will break forward compatibility with such work.I'd also like to see the entire type inferencing pipeline and ideally most of the compilation pipeline entirely platform-independent. There has been a surprising amount of interest in Duby from folks desiring to target LLVM, C, x86 ASM, Parrot, and others. Sadly, without a realistic codebase to work from--one which isolates typing logic from the underlying platform--none of that work has moved forward (though it's almost frightening how quickly people pounced on the idea). So the new system is almost entirely independent of Java, the JVM, and JRuby. Ideally the only pieces you'd need to reimplement (or plugin) for a given platform would be the parser (producing a Duby AST through whatever mechanism you choose) and the type mapper/compiler for a given system. The parser would probably be the more difficult, but since the language syntax is "basically Ruby" and designed to support full AOT compilation you can use any Ruby parser to produce a Ruby AST and then transform it in the same way I transform JRuby's AST. The compiler will mostly be a matter of mapping Duby syntactic structures and inferred generic types to code sequences and native types on your platform of choice.Over the weekend I'll probably try to absorb the available literature on type inference, to learn what I'm doing wrong and what I could be doing better...but I think my "common sense" approach seems to be working out pretty well so far. We shall see. Suggestions for papers, ideally papers designed for mortals, are welcome.So there you have it...the Duby update several of you have been asking for. Satisfied for now? :)(BTW, the code is all available in the JRuby repository, under lib/ruby/site_ruby/1.8/compiler/duby. The new stuff is largely under the ast/ subdir and in the "transform.rb" and "typer2.rb" files. The tests of interest are in test/compiler/duby/test_ast.rb and test_typer.rb) [Less]
|
|
Posted
almost 18 years
ago
by
Ola Bini
It's interesting. After Charlie made an immense effort and rewrote our IO system, basically from scratch, I have started to find bugs. But these are generally not bugs in the IO code, but bugs in Ruby libraries that depend on the way MRI usually
... [More]
works. One of the more annoying ones are IO#read(n), where n is the length you want to read.This method is not guaranteed to return a string of length n, even if we haven't hit EOF yet. You can NEVER be sure that what you get back is the length you requested. Ever. If you have code that doesn't check the length of the returned string from read, you are almost guaranteed to have a bug just waiting to happen.Of course, it might work perfectly on your machine and every other machine you test it on. The reason for this is that read(n) will usually return n bytes, but that depends on the socket implementation or file reading implementation of the operating system, it depends on the size of the cache in the network interface, it depends on network latency, and many other things. Please, just make sure to check the return values length before going ahead and using it.Case in point: net/ldap has this exact problem. Look in net/ber.rb and you will see. There are also two - possibly three - bugs (couple of years old too) that reports different failures because of this.One thing that makes this problem hard to find is the fact that if you insert debug statement, you will affect the timing in such a way that the code might actually work with debug statement but not without them.Oh, did I mention that StringIO#read works the same way? It has exactly the same guarantees as IO#read. [Less]
|