|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
Well, I'm home from Prague, from another edition of TheServerSide Java Symposium. This year was definitely a few notches up from last year in Barcelona in my opinion. And being in beautiful Prague didn't really cause any trouble either. =)I landed on
... [More]
Tuesday, and worked quite heavily on my talks. Due to the ThoughtWorks AwayDay I was really out in the last second with my two slide decks. But I still got to see parts of the city in the evening. Very nice.I managed to sleep over the opening keynote, but dragged myself down to the main room to watch the session on Spring Dynamic Modules. This ended up being more about OSGi style things than really dynamic things, so I felt a bit cheated, and kept on working on my slides instead. Before lunch I sat in on Alex Popescu's talk about scripting databases with Groovy. Over all a very good overview of the database landscape from a Groovy point of view, including just using the language to make the JDBC API's more flexible, building a builder style DSL for working with SQL, to the full blown GORM framework. All in all quite nice. But the funniest part was definitely peoples reaction to the SQL DSL, where most in the room preferred the real SQL to the Groovy version.After lunch I had planned to see the session that compared different dependency injection frameworks, but the speaker never showed up, so I found myself listening to info about JSR-275, that provides support for units in a monetary system. Quite useful if you're working in that domain, but at the same time it felt like this would look so much cleaner in Ruby. Of course, that's how I react to most Java code nowadays.Holly Cummins gave a very good talk about Java Performance Tooling. Of course it was coming with a slight IBM slant, but that's fair. The tools built around their JVM is actually really good for identifying several kinds of performance problems. So I'm actually in a mind to try JRuby on the IBM JVM and see if we can glean some more interesting information from that.Geert gave his Terracotta talk about JVM clustering, and it's really interesting if you haven't seen it before. In this case I took the opportunity to listen while working on my slides.And that was the end of day one.Day two I was a good boy and was actually up in time for the keynote. This might have something to do with the fact that it was Neal Ford giving it, and he talked about Language-Oriented Programming. This is one of my favorite topics, and I'd only seen his slides to this talk before, not heard him give it. If you've been following the discussions about polyglot programming, the content made lots of sense. If you don't believe in polyglot programming, you might have been convinced.After the keynote, it was time for breakfast, so I didn't see the sessions in that slot. After breakfast I sat in on Guillaume's Groovy in the Enterprise: Case Studies. While the presentation were good, he spent more than half of it just giving an introduction to Groovy. I'm not one to throw stones in glass houses, though, so I have to admit that this is something I can be found guilty of too. I'm trying to improve on this though. It makes a disservice to the audience - if they have to sit through the same kind of intro they might already have seen to get to the actual meat. That's one of the reasons I tried to minimize introductionary material in my testing session.It was also in this session that a slide with the words "Groovy is the fastest dynamic language on the JVM" showed up. That's based only on the Alioth benchmarks, and it doesn't actually matter if it's true or not. It's a disservice to the audience. Especially in this case where even if Groovy actually on average is faster than JRuby, we are talking maximum 1-2% in average. The speed differences aren't really why you would be interested in using such a language, and in my opinion Groovy has got lots of other interesting features you can use to sell and market it. In summary, it felt a bit unnecessary.Directly after that session, me, Ted Neward, and Guillaume was featured in a panel on the languages of the next generation. Eugene Ciurana who was supposed to moderate didn't really show up, so John Davies and Kirk Pepperdine had to jump in instead. It ended up being quite fun, but no real heat in the discussion. In something like this, I think it would be useful to have someone with different views to spice it up. Me, Ted and Guillaume just agree about these things way too much. But we got some nice Czeckian vodka. That was good. =)After lunch I spent more time prepping my talk, and then it was finally time to give it. This was the JRuby on Rails introduction, and it ended up being quite nice. I had a good turn-up, and interestingly enough, many in the audience had actually tried Ruby already.After my session was up, I could relax, so I went to Kirks talk about Concurrency and High Performance, which included many things to think about while working on the performance of an enterprise scale application. Very useful material.Finally, at the end of the day it was time for the fireside chats, which is basically another word for BOFs. I sat in on the Zero Turnaround in Java Development session, which ended up not being as much discussion as I had expected, and more talking about the three principals different approachs (RIFE, Grails and JavaRebel).The Fireside Performance Clinic was good fun, and some useful material. In particular, knowing whether JRuby startup time is CPU or IO bound is something I have never thought about, and might yield some interesting insights.Day three felt a bit slower, as the last day usually does. The first session for me was Ted's Scala talk. I've seen it a few times before, but the most interesting part is actually the audience questions. As usual I wasn't disappointed. And Ted did his regular thing and weaved me into the examples. One of the more funny bits were when he was explaining the differences between var and val in Scala, and he decided that it might be good to be able to switch my surname. Then came the killer, where he said something like this: "well, and you might want to change the surname of Ola. Since Ola was just married, congratulations by the way, and he's from Sweden where the husband generally takes the surname of the wife, so we need to change his surname". At that point I had a hard time keeping it together.The session on what's new and exciting in JPA 2 ended up not exciting me at all, so frankly I don't remember anything at all about that. I have vague blurry images of many at-signs.Shashank Tiwari gave a presentation on how to choose your web framework, and this generated some discussion that were quite interesting. At this point I still wasn't finished with the examples for my testing session though, so I had to work on them. And I finally managed to finish it. Because lo, at that time I did the presentation on testing with JRuby. I spent some time on the different Ruby testing frameworks, first showing off how you can test Ruby code with them. Then I switched the model to a Java class, and used basically the same tests again. The cutest example is probably my story about a Stack. Not a literary master piece, but it's still prose.People seemed to like the session and get something out of it, and that feels great since this was the first time I showed JtestR to a larger group of people. My mocking domain consisting of Primates, Food and Factories also seemed to go home. I got the expected laughs at the source code line where a Chimpanzee tries to eat Tuna and "throw new Up();".Typesafe Embedded Java DSLs basically talked about how you can use the standard generic builder patterns to create DSLs that your IDE can help you quite much with. Sadly, my computer decided to give me a heart attach during this presentation, so I had to run out and give it CPR instead of sitting in on the rest of the session.And that was TSSJS-E. For me, the first day was quite weak, but the content of the other two days were definitely extremely good. I can recommend it to anyone next year. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
Well, I'm home from Prague, from another edition of TheServerSide Java Symposium. This year was definitely a few notches up from last year in Barcelona in my opinion. And being in beautiful Prague didn't really cause any trouble either. =)I landed on
... [More]
Tuesday, and worked quite heavily on my talks. Due to the ThoughtWorks AwayDay I was really out in the last second with my two slide decks. But I still got to see parts of the city in the evening. Very nice.I managed to sleep over the opening keynote, but dragged myself down to the main room to watch the session on Spring Dynamic Modules. This ended up being more about OSGi style things than really dynamic things, so I felt a bit cheated, and kept on working on my slides instead. Before lunch I sat in on Alex Popescu's talk about scripting databases with Groovy. Over all a very good overview of the database landscape from a Groovy point of view, including just using the language to make the JDBC API's more flexible, building a builder style DSL for working with SQL, to the full blown GORM framework. All in all quite nice. But the funniest part was definitely peoples reaction to the SQL DSL, where most in the room preferred the real SQL to the Groovy version.After lunch I had planned to see the session that compared different dependency injection frameworks, but the speaker never showed up, so I found myself listening to info about JSR-275, that provides support for units in a monetary system. Quite useful if you're working in that domain, but at the same time it felt like this would look so much cleaner in Ruby. Of course, that's how I react to most Java code nowadays.Holly Cummins gave a very good talk about Java Performance Tooling. Of course it was coming with a slight IBM slant, but that's fair. The tools built around their JVM is actually really good for identifying several kinds of performance problems. So I'm actually in a mind to try JRuby on the IBM JVM and see if we can glean some more interesting information from that.Geert gave his Terracotta talk about JVM clustering, and it's really interesting if you haven't seen it before. In this case I took the opportunity to listen while working on my slides.And that was the end of day one.Day two I was a good boy and was actually up in time for the keynote. This might have something to do with the fact that it was Neal Ford giving it, and he talked about Language-Oriented Programming. This is one of my favorite topics, and I'd only seen his slides to this talk before, not heard him give it. If you've been following the discussions about polyglot programming, the content made lots of sense. If you don't believe in polyglot programming, you might have been convinced.After the keynote, it was time for breakfast, so I didn't see the sessions in that slot. After breakfast I sat in on Guillaume's Groovy in the Enterprise: Case Studies. While the presentation were good, he spent more than half of it just giving an introduction to Groovy. I'm not one to throw stones in glass houses, though, so I have to admit that this is something I can be found guilty of too. I'm trying to improve on this though. It makes a disservice to the audience - if they have to sit through the same kind of intro they might already have seen to get to the actual meat. That's one of the reasons I tried to minimize introductionary material in my testing session.It was also in this session that a slide with the words "Groovy is the fastest dynamic language on the JVM" showed up. That's based only on the Alioth benchmarks, and it doesn't actually matter if it's true or not. It's a disservice to the audience. Especially in this case where even if Groovy actually on average is faster than JRuby, we are talking maximum 1-2% in average. The speed differences aren't really why you would be interested in using such a language, and in my opinion Groovy has got lots of other interesting features you can use to sell and market it. In summary, it felt a bit unnecessary.Directly after that session, me, Ted Neward, and Guillaume was featured in a panel on the languages of the next generation. Eugene Ciurana who was supposed to moderate didn't really show up, so John Davies and Kirk Pepperdine had to jump in instead. It ended up being quite fun, but no real heat in the discussion. In something like this, I think it would be useful to have someone with different views to spice it up. Me, Ted and Guillaume just agree about these things way too much. But we got some nice Czeckian vodka. That was good. =)After lunch I spent more time prepping my talk, and then it was finally time to give it. This was the JRuby on Rails introduction, and it ended up being quite nice. I had a good turn-up, and interestingly enough, many in the audience had actually tried Ruby already.After my session was up, I could relax, so I went to Kirks talk about Concurrency and High Performance, which included many things to think about while working on the performance of an enterprise scale application. Very useful material.Finally, at the end of the day it was time for the fireside chats, which is basically another word for BOFs. I sat in on the Zero Turnaround in Java Development session, which ended up not being as much discussion as I had expected, and more talking about the three principals different approachs (RIFE, Grails and JavaRebel).The Fireside Performance Clinic was good fun, and some useful material. In particular, knowing whether JRuby startup time is CPU or IO bound is something I have never thought about, and might yield some interesting insights.Day three felt a bit slower, as the last day usually does. The first session for me was Ted's Scala talk. I've seen it a few times before, but the most interesting part is actually the audience questions. As usual I wasn't disappointed. And Ted did his regular thing and weaved me into the examples. One of the more funny bits were when he was explaining the differences between var and val in Scala, and he decided that it might be good to be able to switch my surname. Then came the killer, where he said something like this: "well, and you might want to change the surname of Ola. Since Ola was just married, congratulations by the way, and he's from Sweden where the husband generally takes the surname of the wife, so we need to change his surname". At that point I had a hard time keeping it together.The session on what's new and exciting in JPA 2 ended up not exciting me at all, so frankly I don't remember anything at all about that. I have vague blurry images of many at-signs.Shashank Tiwari gave a presentation on how to choose your web framework, and this generated some discussion that were quite interesting. At this point I still wasn't finished with the examples for my testing session though, so I had to work on them. And I finally managed to finish it. Because lo, at that time I did the presentation on testing with JRuby. I spent some time on the different Ruby testing frameworks, first showing off how you can test Ruby code with them. Then I switched the model to a Java class, and used basically the same tests again. The cutest example is probably my story about a Stack. Not a literary master piece, but it's still prose.People seemed to like the session and get something out of it, and that feels great since this was the first time I showed JtestR to a larger group of people. My mocking domain consisting of Primates, Food and Factories also seemed to go home. I got the expected laughs at the source code line where a Chimpanzee tries to eat Tuna and "throw new Up();".Typesafe Embedded Java DSLs basically talked about how you can use the standard generic builder patterns to create DSLs that your IDE can help you quite much with. Sadly, my computer decided to give me a heart attach during this presentation, so I had to run out and give it CPR instead of sitting in on the rest of the session.And that was TSSJS-E. For me, the first day was quite weak, but the content of the other two days were definitely extremely good. I can recommend it to anyone next year. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
While writing the post yesterday about testing regular expressions, I realized that this problem is not really specific to regular expressions. I got a very good comment noting that testing any place that uses some kind of DSL is definitely prudent.
... [More]
SQL is another example.But these examples are both about actually testing the usage of them, and the problem becomes that you have two languages, but you're mostly only testing the code written in the outer language. This is due to several reasons. One of the most obvious ones is that our tools really doesn't make it that easy to do.Thinking about these issues made me start thinking about how we generally test languages. Having worked on several language implementations and worked on both new languages, and implementations of existing languages, I've come to the conclusion that the whole area of testing languages are actually quite complicated, and also there are no real best practices for doing it.First, there is a problem of terminology. Many implementations of languages that are really executable specifications of how the language should work. What's the difference? Well, testing the language according to such a spec, you are really only doing functional, black-box testing. I've looked at several of the open source language implementations, and I don't really see much usage of anything else than such language spec tests. This means basically that some parts of the implementation can be implemented wrongly, and by some freak chance it still works correctly in all the cases you have tests for, but it might fail in other ways.Unit tests for the actual implementation would help with this - it helps since you will be doing TDD on the unit level, it helps because you make a conscious decision about the implementation and what it should be doing in these cases. It still doesn't make everything clear cut and simple, but it absolutely would help. So why don't most implementations do unit testing of the internals? I don't really know. Maybe it's because implementations can be extremely complicated. But that should be a reason for testing more, not testing less. One reason I feel a bit about is that it makes larger changes quite hard. Large refactorings are one of the ways JRuby has used to get incredible performance improvements and new subsystems, but unit tests can sometimes act as inertia for these.I'm totally disregarding the academic approaches here. Yeah, in soem cases for simple languages, you can actually prove that it does what you want it to do, and for small enough implementations using a suitable language, you can actually prove the same things about the implementation. The problem is that this approach doesn't scale.And since a language almost always is turing complete, that means that you can't exhaustively test it. There is no way of testing all permutations - either manually or automatically. So what should a language spec do? The first thing that many languages do are to specify that whole areas of functionality result in undefined behavior. That makes it easier. But the real problems exist when you start combining different features which can interact in different ways.At the end of the day, I have no idea how to actually do this well. I would like to know though - how should I test the implementation, and how should I write an executable language specification? And these questions doesn't even touch on the question of testing the core libraries. Many of the some problems apply, but it gets even more complicated. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
While writing the post yesterday about testing regular expressions, I realized that this problem is not really specific to regular expressions. I got a very good comment noting that testing any place that uses some kind of DSL is definitely prudent.
... [More]
SQL is another example.But these examples are both about actually testing the usage of them, and the problem becomes that you have two languages, but you're mostly only testing the code written in the outer language. This is due to several reasons. One of the most obvious ones is that our tools really doesn't make it that easy to do.Thinking about these issues made me start thinking about how we generally test languages. Having worked on several language implementations and worked on both new languages, and implementations of existing languages, I've come to the conclusion that the whole area of testing languages are actually quite complicated, and also there are no real best practices for doing it.First, there is a problem of terminology. Many implementations of languages that are really executable specifications of how the language should work. What's the difference? Well, testing the language according to such a spec, you are really only doing functional, black-box testing. I've looked at several of the open source language implementations, and I don't really see much usage of anything else than such language spec tests. This means basically that some parts of the implementation can be implemented wrongly, and by some freak chance it still works correctly in all the cases you have tests for, but it might fail in other ways.Unit tests for the actual implementation would help with this - it helps since you will be doing TDD on the unit level, it helps because you make a conscious decision about the implementation and what it should be doing in these cases. It still doesn't make everything clear cut and simple, but it absolutely would help. So why don't most implementations do unit testing of the internals? I don't really know. Maybe it's because implementations can be extremely complicated. But that should be a reason for testing more, not testing less. One reason I feel a bit about is that it makes larger changes quite hard. Large refactorings are one of the ways JRuby has used to get incredible performance improvements and new subsystems, but unit tests can sometimes act as inertia for these.I'm totally disregarding the academic approaches here. Yeah, in soem cases for simple languages, you can actually prove that it does what you want it to do, and for small enough implementations using a suitable language, you can actually prove the same things about the implementation. The problem is that this approach doesn't scale.And since a language almost always is turing complete, that means that you can't exhaustively test it. There is no way of testing all permutations - either manually or automatically. So what should a language spec do? The first thing that many languages do are to specify that whole areas of functionality result in undefined behavior. That makes it easier. But the real problems exist when you start combining different features which can interact in different ways.At the end of the day, I have no idea how to actually do this well. I would like to know though - how should I test the implementation, and how should I write an executable language specification? And these questions doesn't even touch on the question of testing the core libraries. Many of the some problems apply, but it gets even more complicated. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
While writing the post yesterday about testing regular expressions, I realized that this problem is not really specific to regular expressions. I got a very good comment noting that testing any place that uses some kind of DSL is definitely prudent.
... [More]
SQL is another example.But these examples are both about actually testing the usage of them, and the problem becomes that you have two languages, but you're mostly only testing the code written in the outer language. This is due to several reasons. One of the most obvious ones is that our tools really doesn't make it that easy to do.Thinking about these issues made me start thinking about how we generally test languages. Having worked on several language implementations and worked on both new languages, and implementations of existing languages, I've come to the conclusion that the whole area of testing languages are actually quite complicated, and also there are no real best practices for doing it.First, there is a problem of terminology. Many implementations of languages that are really executable specifications of how the language should work. What's the difference? Well, testing the language according to such a spec, you are really only doing functional, black-box testing. I've looked at several of the open source language implementations, and I don't really see much usage of anything else than such language spec tests. This means basically that some parts of the implementation can be implemented wrongly, and by some freak chance it still works correctly in all the cases you have tests for, but it might fail in other ways.Unit tests for the actual implementation would help with this - it helps since you will be doing TDD on the unit level, it helps because you make a conscious decision about the implementation and what it should be doing in these cases. It still doesn't make everything clear cut and simple, but it absolutely would help. So why don't most implementations do unit testing of the internals? I don't really know. Maybe it's because implementations can be extremely complicated. But that should be a reason for testing more, not testing less. One reason I feel a bit about is that it makes larger changes quite hard. Large refactorings are one of the ways JRuby has used to get incredible performance improvements and new subsystems, but unit tests can sometimes act as inertia for these.I'm totally disregarding the academic approaches here. Yeah, in soem cases for simple languages, you can actually prove that it does what you want it to do, and for small enough implementations using a suitable language, you can actually prove the same things about the implementation. The problem is that this approach doesn't scale.And since a language almost always is turing complete, that means that you can't exhaustively test it. There is no way of testing all permutations - either manually or automatically. So what should a language spec do? The first thing that many languages do are to specify that whole areas of functionality result in undefined behavior. That makes it easier. But the real problems exist when you start combining different features which can interact in different ways.At the end of the day, I have no idea how to actually do this well. I would like to know though - how should I test the implementation, and how should I write an executable language specification? And these questions doesn't even touch on the question of testing the core libraries. Many of the some problems apply, but it gets even more complicated. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
This is just a small note, since this have bugged me for a while. Basically, I have lots of extra key bindings running around in my Emacs configuration. Now, I use local-set-key for many of these. The problem is I hadn't actually read the
... [More]
documentation for local-set-key enough.One example that annoyed me was this: I had some local key bindings for RSpec buffers, that differed from the regular Ruby buffers. My RSpec minor mode still uses the ruby-mode-map though. My assumption was that local-set-key did things exactly as all other things with "local" in their name, namely doing a buffer local modification only. I finally found out that this wasn't the case. Instead, when the RSpec minor mode was loaded for the first time, it ended up modifying the ruby-mode-map with its key bindings, which were then visible for all other Ruby buffers. Ouch.So, if you use local-set-key, make sure you actually want to set that key in the current mode map, instead of only for the current buffer.As far as I know, there is no way to set a real buffer local key binding without some acrobatics that unsets and resets the keys manually. I ended up solving my problem with the RSpec minor mode to having it clone the Ruby mode map and have its own mode map. Not an ideal solution, but it works for now. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
This is just a small note, since this have bugged me for a while. Basically, I have lots of extra key bindings running around in my Emacs configuration. Now, I use local-set-key for many of these. The problem is I hadn't actually read the
... [More]
documentation for local-set-key enough.One example that annoyed me was this: I had some local key bindings for RSpec buffers, that differed from the regular Ruby buffers. My RSpec minor mode still uses the ruby-mode-map though. My assumption was that local-set-key did things exactly as all other things with "local" in their name, namely doing a buffer local modification only. I finally found out that this wasn't the case. Instead, when the RSpec minor mode was loaded for the first time, it ended up modifying the ruby-mode-map with its key bindings, which were then visible for all other Ruby buffers. Ouch.So, if you use local-set-key, make sure you actually want to set that key in the current mode map, instead of only for the current buffer.As far as I know, there is no way to set a real buffer local key binding without some acrobatics that unsets and resets the keys manually. I ended up solving my problem with the RSpec minor mode to having it clone the Ruby mode map and have its own mode map. Not an ideal solution, but it works for now. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
This is just a small note, since this have bugged me for a while. Basically, I have lots of extra key bindings running around in my Emacs configuration. Now, I use local-set-key for many of these. The problem is I hadn't actually read the
... [More]
documentation for local-set-key enough.One example that annoyed me was this: I had some local key bindings for RSpec buffers, that differed from the regular Ruby buffers. My RSpec minor mode still uses the ruby-mode-map though. My assumption was that local-set-key did things exactly as all other things with "local" in their name, namely doing a buffer local modification only. I finally found out that this wasn't the case. Instead, when the RSpec minor mode was loaded for the first time, it ended up modifying the ruby-mode-map with its key bindings, which were then visible for all other Ruby buffers. Ouch.So, if you use local-set-key, make sure you actually want to set that key in the current mode map, instead of only for the current buffer.As far as I know, there is no way to set a real buffer local key binding without some acrobatics that unsets and resets the keys manually. I ended up solving my problem with the RSpec minor mode to having it clone the Ruby mode map and have its own mode map. Not an ideal solution, but it works for now. [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
Something has been worrying me a bit lately. Being test infected and all, and working for ThoughtWorks, where testing is part of the life blood, I think more and more about these issues. And one thing I've started noticing is that regular expressions
... [More]
seems to be a total blind spot in many cases. I first started thinking about it when I changed a quite complicated regular expression in RSpec. Now RSpec has coverage tests as part of their build, and if the test coverage is less than a 100%, the build will fail. Now, since I had changed something to add new functionality, but hadn't added any tests for it, I instinctively assumed that it would be caught be the coverage tool.Guess what? It wasn't. Of course, if I had changed the regexp to do something that the surrounding code couldn't support, one of the tests for surrounding lines of code would have caught it, but I got no mention from the coverage tool that I needed more tests to fully handle the regular expressions. This is logical if you think about it. There is no way that a coverage tool could find all the regular expressions in your source code, and then make sure that all branches and alternatives of that particular regular expression was exercised. So that means that the coverage tool doesn't do anything with them at all.OK, I can live with that, but it's still one of those points that would be very good to keep in mind. Every time you write a regular expression in your code, you need to take special care to actually exercise that part of the code with many inputs. What is many in this case? That's another part of the problem - it depends on the regular expression. It depends on how complicated it is, how long it is, how many special operators are used, and so on. There is no real way around it. To test a regular expression, you really need to understand how they work. The corollary is obvious - to use a regular expression in your code, you need to know how to test it. Conclusion - you need to understand regular expressions.In many code bases I haven't seen any tests for regular expressions at all. In most cases these have been crafted by writing them outside the code, testing them by hand, and then putting them in the code. This is brittle to say the least. In the cases where there are tests, it's much more common that they only test positives, and not negatives. And I've seldom heard of code bases with enough tests for regular expressions. One of the problems is that in a language like Ruby, they are so easy to use, so you stick them in all over the place. A standard refactoring could help here, by extracting all literal regular expressions to constants. But then the problem becomes another - as soon as you use regular expressions to extract values from a string, it's a pain to not have the regular expression at the same place as the extracted groups are used. Example:PhoneRegexp = /(\d{3})-?(\d{4})-?(\d{4})/# 200 lines of codeif phone_number =~ PhoneRegexpputs "phone number is: #$1-#$2-#$3"endIf the regular expression had been at the same place as the usage of the $1, $2 and $3 it would have been easy to tie them to the parts of the string. In this case it would be easy anyway, but in more complicated cases it's more complicated. The solution to this is easy - the dollar numbers are evil: don't use them. Instead use an idiom like this:area, number, extension = PhoneRegexp.match(phone_number).capturesIn Ruby 1.9 you will be able to use named captures, and that will make it even easier to make readable usage of the extracted parts of a string. But fact is, the difference between the usage point and the definition point can still cause trouble. A way of getting around this would be to take any complicated regular expression and putting it inside of a specific class for only that purpose. The class would then encapsulate the usage, and would also allow you to test the regular expression more or less in isolation. In the example above, maybe creating a PhoneNumberParser would be a good idea.At the end of the day, regular expressions are an extremely complicated feature, and in general we don't test the usage of them enough. So you should start. Begin by first creating both positive and negative tests for them. Figure out the boundaries, and see where they can go wrong. Know regular expressions well enough to know what happens in these strange circumstances. Think about unicode characters. Think about whitespace. Think about greedy and lazy matching. As an example of something that took a long time to cause trouble; what's wrong with this regexp that tries to discern if a string is a select statement or not?/^\s*\(*\s*SELECT\W+/iAnd this example actually covers most of the ground, already. It checks case insensitive. It checks for white space before any optional parenthesis, and for any white space after. It makes sure that the word SELECT isn't continued by checking for at least one non word character. So what's wrong with it? Well... It's the caret. Imagine if we had a string like this:"INSERT INTO foo(a,b,c)\nSELECT * FROM bar"The regular expression will in fact match this, even though it's not a select statement. Why? Well, it just so happens that the caret matches the beginning of lines, not the beginning of strings. The dollar sign works the same way, matching the end of lines. How do you solve it? Change the caret to \A and the dollar sign to \Z and it will work as expected. A similar problem can show up with the "." to match any character. Depending on which language you are using, the dot might or might not match a newline. Always make sure you know which one you want, and what you don't want.Finally, these are just some thoughts I had while writing it. There is much more advice to give, but it can be condensed to this: understand regular expressions, and test them. The dot isn't as simple as it seem. Regular expressions are a full blown language, even though it's not turing complete (in most implementations). That means that you can't test it completely, in the general case. This doesn't mean you shouldn't try to cover all eventualities.How are you testing your regular expressions? How much? [Less]
|
|
Posted
over 17 years
ago
by
[email protected] (Ola Bini)
Something has been worrying me a bit lately. Being test infected and all, and working for ThoughtWorks, where testing is part of the life blood, I think more and more about these issues. And one thing I've started noticing is that regular expressions
... [More]
seems to be a total blind spot in many cases. I first started thinking about it when I changed a quite complicated regular expression in RSpec. Now RSpec has coverage tests as part of their build, and if the test coverage is less than a 100%, the build will fail. Now, since I had changed something to add new functionality, but hadn't added any tests for it, I instinctively assumed that it would be caught be the coverage tool.Guess what? It wasn't. Of course, if I had changed the regexp to do something that the surrounding code couldn't support, one of the tests for surrounding lines of code would have caught it, but I got no mention from the coverage tool that I needed more tests to fully handle the regular expressions. This is logical if you think about it. There is no way that a coverage tool could find all the regular expressions in your source code, and then make sure that all branches and alternatives of that particular regular expression was exercised. So that means that the coverage tool doesn't do anything with them at all.OK, I can live with that, but it's still one of those points that would be very good to keep in mind. Every time you write a regular expression in your code, you need to take special care to actually exercise that part of the code with many inputs. What is many in this case? That's another part of the problem - it depends on the regular expression. It depends on how complicated it is, how long it is, how many special operators are used, and so on. There is no real way around it. To test a regular expression, you really need to understand how they work. The corollary is obvious - to use a regular expression in your code, you need to know how to test it. Conclusion - you need to understand regular expressions.In many code bases I haven't seen any tests for regular expressions at all. In most cases these have been crafted by writing them outside the code, testing them by hand, and then putting them in the code. This is brittle to say the least. In the cases where there are tests, it's much more common that they only test positives, and not negatives. And I've seldom heard of code bases with enough tests for regular expressions. One of the problems is that in a language like Ruby, they are so easy to use, so you stick them in all over the place. A standard refactoring could help here, by extracting all literal regular expressions to constants. But then the problem becomes another - as soon as you use regular expressions to extract values from a string, it's a pain to not have the regular expression at the same place as the extracted groups are used. Example:PhoneRegexp = /(\d{3})-?(\d{4})-?(\d{4})/# 200 lines of codeif phone_number =~ PhoneRegexpputs "phone number is: #$1-#$2-#$3"endIf the regular expression had been at the same place as the usage of the $1, $2 and $3 it would have been easy to tie them to the parts of the string. In this case it would be easy anyway, but in more complicated cases it's more complicated. The solution to this is easy - the dollar numbers are evil: don't use them. Instead use an idiom like this:area, number, extension = PhoneRegexp.match(phone_number).capturesIn Ruby 1.9 you will be able to use named captures, and that will make it even easier to make readable usage of the extracted parts of a string. But fact is, the difference between the usage point and the definition point can still cause trouble. A way of getting around this would be to take any complicated regular expression and putting it inside of a specific class for only that purpose. The class would then encapsulate the usage, and would also allow you to test the regular expression more or less in isolation. In the example above, maybe creating a PhoneNumberParser would be a good idea.At the end of the day, regular expressions are an extremely complicated feature, and in general we don't test the usage of them enough. So you should start. Begin by first creating both positive and negative tests for them. Figure out the boundaries, and see where they can go wrong. Know regular expressions well enough to know what happens in these strange circumstances. Think about unicode characters. Think about whitespace. Think about greedy and lazy matching. As an example of something that took a long time to cause trouble; what's wrong with this regexp that tries to discern if a string is a select statement or not?/^\s*\(*\s*SELECT\W+/iAnd this example actually covers most of the ground, already. It checks case insensitive. It checks for white space before any optional parenthesis, and for any white space after. It makes sure that the word SELECT isn't continued by checking for at least one non word character. So what's wrong with it? Well... It's the caret. Imagine if we had a string like this:"INSERT INTO foo(a,b,c)\nSELECT * FROM bar"The regular expression will in fact match this, even though it's not a select statement. Why? Well, it just so happens that the caret matches the beginning of lines, not the beginning of strings. The dollar sign works the same way, matching the end of lines. How do you solve it? Change the caret to \A and the dollar sign to \Z and it will work as expected. A similar problem can show up with the "." to match any character. Depending on which language you are using, the dot might or might not match a newline. Always make sure you know which one you want, and what you don't want.Finally, these are just some thoughts I had while writing it. There is much more advice to give, but it can be condensed to this: understand regular expressions, and test them. The dot isn't as simple as it seem. Regular expressions are a full blown language, even though it's not turing complete (in most implementations). That means that you can't test it completely, in the general case. This doesn't mean you shouldn't try to cover all eventualities.How are you testing your regular expressions? How much? [Less]
|