Something has been worrying me a bit lately. Being test infected and all, and working for ThoughtWorks, where testing is part of the life blood, I think more and more about these issues. And one thing I've started noticing is that regular expressions
... [More]
seems to be a total blind spot in many cases. I first started thinking about it when I changed a quite complicated regular expression in RSpec. Now RSpec has coverage tests as part of their build, and if the test coverage is less than a 100%, the build will fail. Now, since I had changed something to add new functionality, but hadn't added any tests for it, I instinctively assumed that it would be caught be the coverage tool.Guess what? It wasn't. Of course, if I had changed the regexp to do something that the surrounding code couldn't support, one of the tests for surrounding lines of code would have caught it, but I got no mention from the coverage tool that I needed more tests to fully handle the regular expressions. This is logical if you think about it. There is no way that a coverage tool could find all the regular expressions in your source code, and then make sure that all branches and alternatives of that particular regular expression was exercised. So that means that the coverage tool doesn't do anything with them at all.OK, I can live with that, but it's still one of those points that would be very good to keep in mind. Every time you write a regular expression in your code, you need to take special care to actually exercise that part of the code with many inputs. What is many in this case? That's another part of the problem - it depends on the regular expression. It depends on how complicated it is, how long it is, how many special operators are used, and so on. There is no real way around it. To test a regular expression, you really need to understand how they work. The corollary is obvious - to use a regular expression in your code, you need to know how to test it. Conclusion - you need to understand regular expressions.In many code bases I haven't seen any tests for regular expressions at all. In most cases these have been crafted by writing them outside the code, testing them by hand, and then putting them in the code. This is brittle to say the least. In the cases where there are tests, it's much more common that they only test positives, and not negatives. And I've seldom heard of code bases with enough tests for regular expressions. One of the problems is that in a language like Ruby, they are so easy to use, so you stick them in all over the place. A standard refactoring could help here, by extracting all literal regular expressions to constants. But then the problem becomes another - as soon as you use regular expressions to extract values from a string, it's a pain to not have the regular expression at the same place as the extracted groups are used. Example:PhoneRegexp = /(\d{3})-?(\d{4})-?(\d{4})/# 200 lines of codeif phone_number =~ PhoneRegexpputs "phone number is: #$1-#$2-#$3"endIf the regular expression had been at the same place as the usage of the $1, $2 and $3 it would have been easy to tie them to the parts of the string. In this case it would be easy anyway, but in more complicated cases it's more complicated. The solution to this is easy - the dollar numbers are evil: don't use them. Instead use an idiom like this:area, number, extension = PhoneRegexp.match(phone_number).capturesIn Ruby 1.9 you will be able to use named captures, and that will make it even easier to make readable usage of the extracted parts of a string. But fact is, the difference between the usage point and the definition point can still cause trouble. A way of getting around this would be to take any complicated regular expression and putting it inside of a specific class for only that purpose. The class would then encapsulate the usage, and would also allow you to test the regular expression more or less in isolation. In the example above, maybe creating a PhoneNumberParser would be a good idea.At the end of the day, regular expressions are an extremely complicated feature, and in general we don't test the usage of them enough. So you should start. Begin by first creating both positive and negative tests for them. Figure out the boundaries, and see where they can go wrong. Know regular expressions well enough to know what happens in these strange circumstances. Think about unicode characters. Think about whitespace. Think about greedy and lazy matching. As an example of something that took a long time to cause trouble; what's wrong with this regexp that tries to discern if a string is a select statement or not?/^\s*\(*\s*SELECT\W /iAnd this example actually covers most of the ground, already. It checks case insensitive. It checks for white space before any optional parenthesis, and for any white space after. It makes sure that the word SELECT isn't continued by checking for at least one non word character. So what's wrong with it? Well... It's the caret. Imagine if we had a string like this:"INSERT INTO foo(a,b,c)\nSELECT * FROM bar"The regular expression will in fact match this, even though it's not a select statement. Why? Well, it just so happens that the caret matches the beginning of lines, not the beginning of strings. The dollar sign works the same way, matching the end of lines. How do you solve it? Change the caret to \A and the dollar sign to \Z and it will work as expected. A similar problem can show up with the "." to match any character. Depending on which language you are using, the dot might or might not match a newline. Always make sure you know which one you want, and what you don't want.Finally, these are just some thoughts I had while writing it. There is much more advice to give, but it can be condensed to this: understand regular expressions, and test them. The dot isn't as simple as it seem. Regular expressions are a full blown language, even though it's not turing complete (in most implementations). That means that you can't test it completely, in the general case. This doesn't mean you shouldn't try to cover all eventualities.How are you testing your regular expressions? How much? [Less]
|
In a recent discussion around one of Steve Yegge's blog post, an incidental remark was that it's OK that a language makes it harder for a library creator than for an application developer. This point was made by David Pollak and Martin Odersky in
... [More]
relation to some of the complications that you need to handle when creating a Scala library that you can intuitively use without a full understanding of the Scala type system. Make no mistake, I have lots of respect for both Martin and David, it's just that in this case I think it's actually a quite damaging assumption to make. And they are not the only ones who reason like that either. Joshua Bloch's book Effective Java includes this assumption too, in many places.So what's wrong with it then? Isn't there a difference between developing an application and a library. Yes, there is a difference, but it's definitely not as large as people make it out to be. And even more importantly: it _shouldn't_ be that much of a difference. The argument from David was that when creating a library in Scala, he needs to focus and work with quite complicated parts of the type system so that the consumer gets a nice API to use the library through. This process is much harder than just using the library would be.Effective Java contains much good advice, but most of them are from the perspective of someone who creates libraries for a living, and there are a few places where Josh explicitly says that his advice isn't necessarily applicable when writing an application, since he doesn't have that point of view.Let's take a look at a fundamental question then. What is actually a library, and what is an application? In my opinion, a library is a module providing functionality of some kind, restricted to a specific domain. This can be a horizontal or vertical domain, that doesn't matter, but it's usually something that is usable in more than one circumstance. It's not uncommon that libraries use other libraries to implements its functionality. An application is usually a collection of libraries that provide functionality to an end user. That end user can be either a person, a program or another computer - that doesn't matter. But wait, isn't libraries usually also created to provide functionality to other pieces of code? And even though libraries have a tendency to contain more specific code, and less usage of other libraries, the line is extremely fuzzy.The way most applications seems to be built now, most of the work is done to collect libraries, provide the missing functionality and glue them together in some way. But that doesn't mean that the code you write in the application won't be used as a library by another consumer. In fact, it's more and more common to try to reuse as much as possible, and especially when you extend an existing application, it's extremely important that you can consume the existing functionality in a sane way.So why make the distinction? Doing that seems to me to be an excuse for writing bad code if it's in an application. Why won't we as programmers admit that we don't know if someone else will need to consume the code later, and write the best code we can, including creating usable ad well thought out public APIs? Yes, the cost and time will be higher, but that's true for writing tests too. I don't see any value in arguing that libraries should be designed with more care than application code. In fact, I think that attitude is actively detrimental to the industry. And adding a language feature to a language that is complicated, and then arguing that only "library developers" will need to understand it is definitely not the right way to go. A responsible developer using a language needs to understand how that language works. Otherwise that developer will sooner or later cause a great mess. It's just a matter of time. [Less]
|
In a recent discussion around one of Steve Yegge's blog post, an incidental remark was that it's OK that a language makes it harder for a library creator than for an application developer. This point was made by David Pollak and Martin Odersky in
... [More]
relation to some of the complications that you need to handle when creating a Scala library that you can intuitively use without a full understanding of the Scala type system. Make no mistake, I have lots of respect for both Martin and David, it's just that in this case I think it's actually a quite damaging assumption to make. And they are not the only ones who reason like that either. Joshua Bloch's book Effective Java includes this assumption too, in many places.So what's wrong with it then? Isn't there a difference between developing an application and a library. Yes, there is a difference, but it's definitely not as large as people make it out to be. And even more importantly: it _shouldn't_ be that much of a difference. The argument from David was that when creating a library in Scala, he needs to focus and work with quite complicated parts of the type system so that the consumer gets a nice API to use the library through. This process is much harder than just using the library would be.Effective Java contains much good advice, but most of them are from the perspective of someone who creates libraries for a living, and there are a few places where Josh explicitly says that his advice isn't necessarily applicable when writing an application, since he doesn't have that point of view.Let's take a look at a fundamental question then. What is actually a library, and what is an application? In my opinion, a library is a module providing functionality of some kind, restricted to a specific domain. This can be a horizontal or vertical domain, that doesn't matter, but it's usually something that is usable in more than one circumstance. It's not uncommon that libraries use other libraries to implements its functionality. An application is usually a collection of libraries that provide functionality to an end user. That end user can be either a person, a program or another computer - that doesn't matter. But wait, isn't libraries usually also created to provide functionality to other pieces of code? And even though libraries have a tendency to contain more specific code, and less usage of other libraries, the line is extremely fuzzy.The way most applications seems to be built now, most of the work is done to collect libraries, provide the missing functionality and glue them together in some way. But that doesn't mean that the code you write in the application won't be used as a library by another consumer. In fact, it's more and more common to try to reuse as much as possible, and especially when you extend an existing application, it's extremely important that you can consume the existing functionality in a sane way.So why make the distinction? Doing that seems to me to be an excuse for writing bad code if it's in an application. Why won't we as programmers admit that we don't know if someone else will need to consume the code later, and write the best code we can, including creating usable ad well thought out public APIs? Yes, the cost and time will be higher, but that's true for writing tests too. I don't see any value in arguing that libraries should be designed with more care than application code. In fact, I think that attitude is actively detrimental to the industry. And adding a language feature to a language that is complicated, and then arguing that only "library developers" will need to understand it is definitely not the right way to go. A responsible developer using a language needs to understand how that language works. Otherwise that developer will sooner or later cause a great mess. It's just a matter of time. [Less]
|
In a recent discussion around one of Steve Yegge's blog post, an incidental remark was that it's OK that a language makes it harder for a library creator than for an application developer. This point was made by David Pollak and Martin Odersky in
... [More]
relation to some of the complications that you need to handle when creating a Scala library that you can intuitively use without a full understanding of the Scala type system. Make no mistake, I have lots of respect for both Martin and David, it's just that in this case I think it's actually a quite damaging assumption to make. And they are not the only ones who reason like that either. Joshua Bloch's book Effective Java includes this assumption too, in many places.So what's wrong with it then? Isn't there a difference between developing an application and a library. Yes, there is a difference, but it's definitely not as large as people make it out to be. And even more importantly: it _shouldn't_ be that much of a difference. The argument from David was that when creating a library in Scala, he needs to focus and work with quite complicated parts of the type system so that the consumer gets a nice API to use the library through. This process is much harder than just using the library would be.Effective Java contains much good advice, but most of them are from the perspective of someone who creates libraries for a living, and there are a few places where Josh explicitly says that his advice isn't necessarily applicable when writing an application, since he doesn't have that point of view.Let's take a look at a fundamental question then. What is actually a library, and what is an application? In my opinion, a library is a module providing functionality of some kind, restricted to a specific domain. This can be a horizontal or vertical domain, that doesn't matter, but it's usually something that is usable in more than one circumstance. It's not uncommon that libraries use other libraries to implements its functionality. An application is usually a collection of libraries that provide functionality to an end user. That end user can be either a person, a program or another computer - that doesn't matter. But wait, isn't libraries usually also created to provide functionality to other pieces of code? And even though libraries have a tendency to contain more specific code, and less usage of other libraries, the line is extremely fuzzy.The way most applications seems to be built now, most of the work is done to collect libraries, provide the missing functionality and glue them together in some way. But that doesn't mean that the code you write in the application won't be used as a library by another consumer. In fact, it's more and more common to try to reuse as much as possible, and especially when you extend an existing application, it's extremely important that you can consume the existing functionality in a sane way.So why make the distinction? Doing that seems to me to be an excuse for writing bad code if it's in an application. Why won't we as programmers admit that we don't know if someone else will need to consume the code later, and write the best code we can, including creating usable ad well thought out public APIs? Yes, the cost and time will be higher, but that's true for writing tests too. I don't see any value in arguing that libraries should be designed with more care than application code. In fact, I think that attitude is actively detrimental to the industry. And adding a language feature to a language that is complicated, and then arguing that only "library developers" will need to understand it is definitely not the right way to go. A responsible developer using a language needs to understand how that language works. Otherwise that developer will sooner or later cause a great mess. It's just a matter of time. [Less]
|
JtestR allows you to test your Java code with Ruby frameworks.Homepage: http://jtestr.codehaus.orgDownload: http://dist.codehaus.org/jtestrJtestR 0.3 is the current release of the JtestR testing tool. JtestR integrates JRuby with several Ruby
... [More]
frameworks to allow painless testing of Java code, using RSpec, Test/Unit, Expectations, dust and Mocha.Features:- Integrates with Ant, Maven and JUnit- Includes JRuby 1.1, Test/Unit, RSpec, Expectations, dust, Mocha and ActiveSupport- Customizes Mocha so that mocking of any Java class is possible- Background testing server for quick startup of tests- Automatically runs your JUnit and TestNG codebase as part of the buildGetting started: http://jtestr.codehaus.org/Getting StartedThe 0.3 release has focused on stabilizing Maven support, and adding new capabilities for JUnit integration.New and fixed in this release:JTESTR-47 Maven with subprojects should work intuitivelyJTESTR-42 Maven dependencies should be automatically picked up by the test runJTESTR-41 Driver jtestr from junitJTESTR-37 Can't expect a specific Java exception correctlyJTESTR-36 IDE integration, possibility to run single testsJTESTR-35 Support XML output of test reportsTeam:Ola Bini - [email protected] Abramovici - [email protected] [Less]
|
JtestR allows you to test your Java code with Ruby frameworks.Homepage: http://jtestr.codehaus.orgDownload: http://dist.codehaus.org/jtestrJtestR 0.3 is the current release of the JtestR testing tool. JtestR integrates JRuby with several Ruby
... [More]
frameworks to allow painless testing of Java code, using RSpec, Test/Unit, Expectations, dust and Mocha.Features:- Integrates with Ant, Maven and JUnit- Includes JRuby 1.1, Test/Unit, RSpec, Expectations, dust, Mocha and ActiveSupport- Customizes Mocha so that mocking of any Java class is possible- Background testing server for quick startup of tests- Automatically runs your JUnit and TestNG codebase as part of the buildGetting started: http://jtestr.codehaus.org/Getting+StartedThe 0.3 release has focused on stabilizing Maven support, and adding new capabilities for JUnit integration.New and fixed in this release:JTESTR-47 Maven with subprojects should work intuitivelyJTESTR-42 Maven dependencies should be automatically picked up by the test runJTESTR-41 Driver jtestr from junitJTESTR-37 Can't expect a specific Java exception correctlyJTESTR-36 IDE integration, possibility to run single testsJTESTR-35 Support XML output of test reportsTeam:Ola Bini - [email protected] Abramovici - [email protected] [Less]
|
JtestR allows you to test your Java code with Ruby frameworks.Homepage: http://jtestr.codehaus.orgDownload: http://dist.codehaus.org/jtestrJtestR 0.3 is the current release of the JtestR testing tool. JtestR integrates JRuby with several Ruby
... [More]
frameworks to allow painless testing of Java code, using RSpec, Test/Unit, Expectations, dust and Mocha.Features:- Integrates with Ant, Maven and JUnit- Includes JRuby 1.1, Test/Unit, RSpec, Expectations, dust, Mocha and ActiveSupport- Customizes Mocha so that mocking of any Java class is possible- Background testing server for quick startup of tests- Automatically runs your JUnit and TestNG codebase as part of the buildGetting started: http://jtestr.codehaus.org/Getting+StartedThe 0.3 release has focused on stabilizing Maven support, and adding new capabilities for JUnit integration.New and fixed in this release:JTESTR-47 Maven with subprojects should work intuitivelyJTESTR-42 Maven dependencies should be automatically picked up by the test runJTESTR-41 Driver jtestr from junitJTESTR-37 Can't expect a specific Java exception correctlyJTESTR-36 IDE integration, possibility to run single testsJTESTR-35 Support XML output of test reportsTeam:Ola Bini - [email protected] Abramovici - [email protected] [Less]
|
Best quote this whole day, found in http://www.codinghorror.com/blog/archives/001131.html#comments.If Ruby offered something new I would have learned it fine tbh... its just difficult enough to not be able to "pick up and run with" like almost
... [More]
everything else out there... but honestly, it wouldn't let me do anything I can't already do.My brain almost exploded reading that. [Less]
|
Best quote this whole day, found in http://www.codinghorror.com/blog/archives/001131.html#comments.If Ruby offered something new I would have learned it fine tbh... its just difficult enough to not be able to "pick up and run with" like almost
... [More]
everything else out there... but honestly, it wouldn't let me do anything I can't already do.My brain almost exploded reading that. [Less]
|
Best quote this whole day, found in http://www.codinghorror.com/blog/archives/001131.html#comments.If Ruby offered something new I would have learned it fine tbh... its just difficult enough to not be able to "pick up and run with" like almost
... [More]
everything else out there... but honestly, it wouldn't let me do anything I can't already do.My brain almost exploded reading that. [Less]
|