Skip to content

Introducing the regex-tester library for Java

regex-tester version 0.1 is an open source project that removes the boiler-plate code needed to test regular expressions with JUnit.

Regular expressions often contain business logic that is important to an application yet are rarely put through rigorous, automated testing. That’s unfortunate because it’s generally so easy to test a regular expression (regex, from here on). In many cases, you just want to know that a given string produces a match when the regex  is applied to it. That’s easy even without regex-tester but tested so infrequently in real software.

Running the regex against a handful of strings in a JUnit test is superior coverage to none at all. That’s easy to do with regex-tester:

public class BasicRegexTest {

    public static List<RegexTestStringInfo> getTestParameters() {
        return Arrays.asList(new RegexTestStringInfo[] {
                new RegexTestStringInfo(true, "com"),
                new RegexTestStringInfo(true, "com.thewonggei"),
                new RegexTestStringInfo(true, "com.thewonggei.regexTester"),
                new RegexTestStringInfo(false, ".com.thewonggei"),
                new RegexTestStringInfo(false, "")

    public void test() {}

Running the test suite above will automatically create and execute a test case for each test string specified. Even that is a lot of typing for when you want better coverage of the regex. If you want to test many more strings, you can put them in a simple properties file like this:

#A couple of out-of-range examples first
#These are just the wrong format

#However many test strings you desire

#A whole bunch of matching strings

Then the JUnit test gets even simpler:

@Regex(value="^(19|20)\\d\\d[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])$")
public class DateFormatRegexTest {

    public void test() {}


In both cases, the crucial lines of code declare the important pieces under test: the specialized JUnit test suite to use (@RegexTestSuite), the regex to test with (@Regex) and the strings to execute against the regex (either the @RegexTestStrings method or @RegexTestStringsFile). For each string, a boolean value is supplied that indicates whether or not the regex should produce a match.

Therefore, the meaning of

RegexTestStringInfo(true, "com")

is “the given regex will produce a match against the string ‘com’”. If using the properties file

means “the given regex will not produce a match against the string ‘’”.

These two examples show both the library’s current capabilities and all that is required of you to start automating the testing of your regular expressions. There is much yet to do (e.g. I know you already wondered why you can’t test for the number and content of matches in regular expressions with groups defined) that will come out in later versions. For now, I’m anxious to know about any usage of the library and, especially, what you think needs added or fixed. Feel free to open up an issue on GitHub for any usage problems, bugs or feature requests.

Get the Code

The code is under the MIT License and is hosted on GitHub here. See the project’s README file for more details.

Get the JARs

Maven coordinates:


Or, download the JARs directly from Maven Central here.


Review of “Learning jQuery” by Ralph Steyer

 Learning jQuery: A Hands-on Guide to Building Rich Interactive Web Front Ends is written by Ralph Steyer, and published by Addison Wesley, © 2013, (paperback), 978-0-321-81526-2, 495 pp., $39.99 US.

Learning jQuery is a single entry in the pantheon of jQuery books that introduce web programmers to this JavaScript framework (Amazon has over 20 such books available as of November, 2013). Steyer’s book follows the same pattern as similar books I’ve read and explains the core functionality provided by jQuery: the jQuery object, selectors and filters, how to manipulate the DOM (Document Object Model), AJAX (Asynchronous JavaScript and XML) and animation. These and other topics are explained with clear, knowledgeable descriptions and ample code samples and tutorials that include both HTML and manipulation of it through JavaScript and jQuery. Steyer includes an adequate number of illustrative screenshots too.

The book begins with jQuery fundamentals and progresses to advanced topics at the end. Advanced topics include jQuery UI, plug-ins and jQuery Mobile. The fundamentals are the focus, however, taking up about ¾ of the book. Despite the tilt towards developers new to jQuery, this book assumes that the reader is already competent with JavaScript and HTML programming. In fact, tackling the plug-ins chapter will be the most difficult for a reader who lacks adequate experience programming in JavaScript.

Even though this book follows the same formula as many other books about jQuery, it is a worthwhile read. Steyer obviously understands jQuery thoroughly as he frequently describes common pitfalls in using the framework and how to avoid them. The information in the book is far more illustrative than the reference information available on and far more instructive than the thousands of sketchy examples and explanations you dig up with a web search.

Despite having experience with jQuery in production web apps, I learned quite a few new tricks from this book. I also found that I gained a deeper understanding of how jQuery works, which I believe will make my future jQuery development more precise and effective.

My Simple View of Testing for Software Developers

Assumptions + important edge cases. That’s it. That’s all I consider when I’m testing my code. But knowing that you aren’t satisfied with such a trivial explanation, I’ll explain.

Since I discovered Test Driven Development (TDD) a handful of years ago I have followed it, practiced it, tried to adhere to its loose tenants. There’s no question that, in order for me to produce code that I am comfortable releasing to QA and users, TDD is the sanest path we software developers now have to follow. The outcome of my embrace of and consequent struggles with TDD is that I have developed an intuition about testing my code that only now can I explain simply as assumptions + important edge cases. That formula is how I determine when I’ve done enough testing.


Assumptions are the things you take to be true about how your code should work. That covers both the input and output side of the code you are currently working on. Since TDD drives your work down to the individual functions, your assumptions are the current list of facts you possess about what might be input to your function and what output each of those inputs should produce.

You cannot and should not test functionality that you do not need—that is, perhaps, the most fundamental tenant of TDD. The popular aphorism You’re Not Gonna Need It (YAGNI) applies here. So once you are satisfied that you have collected all the facts about some code you need to write, you only need to work on it until all of your assumptions are codified in that function. Then stop. If you don’t, you’ll start testing for functionality that only you dreamed up for the code to do. The assumptions you have gathered, most likely through Agile processes that are well documented elsewhere, represent the entire universe of outcomes that anyone, at a static point in time, care about the code producing. If the code does something outside of that tenuous list, so what? The assumption is that no one will ever ask the code to do that extra work anyway. You and the people who helped formulate the assumptions have a contract that only those assumptions matter.

Important Edge Cases

Of course there are always those conditions that your business sponsors and business analysts and such never think of—stuff that only programmers care about. Those are the edge cases. You have been trained to worry about the extremes in the input range when writing a function.

I worry about that too when I’m writing unit tests. Even when the edge cases aren’t covered by the list of assumptions. For every function there are those input values that just wreak havoc when not handled properly. Maybe, for instance, you know that letting a null value be operated on by your function is an edge case that you absolutely must deal with. Even though handling null values in some graceful manner might not be in your list of assumed behaviors for your code, you know that you have to write code to handle that situation in a reasonable manner (Want to make the world a better place? Give it fewer NullPointerExceptions to deal with).

The null value example is an instance of an important edge case. It’s just one of a small list of edge cases that you should care about though. If you wrote the code to match your assumptions already, then all you have left to do is handle the few very important edge cases that keep your code from behaving particularly badly. And that is a good place to stop testing and writing code.

There’s Always a Caveat (or two)

The caveat that has probably already caused you to think about some scathing comment to add to this post is that whole assumption part. It’s just so damn open-ended. That’s true but I’m ok with that (you don’t have to be though). The idea is that I can grasp the idea of figuring out what I, at any given time, assume about the operation of a piece of code. I can write out that finite list of assumptions and test each one. When I add the edge cases on the end of that list, I can test the code for everything I know about it’s expected functionality. In other words, assumptions + important edge cases is merely my heuristic for figuring out when I’m done writing and unit testing a piece of code.

The other thing about assumptions is that they can include more than just business-driven functionality. Depending on your development team, they could include standard programming practices around security, code monitoring, performance, etc…. You might have a common list of assumptions that applies to every function you write just to satisfy the software development standards in your organization.


You will get some overlap between the two concepts of assumptions and important edge cases. The Venn diagram (I love these things) below shows that some of your assumptions will cover some important edge cases. It’s up to you to sort all of that out and just pick the edge cases that are still important to consider but didn’t happen to be on the list of assumptions.

One approach to unit testing is to consider just these two aspects of requirements.

One approach to unit testing is to consider just these two aspects of requirements.

That’s the attitude I take towards writing code and unit testing it, for better or worse. It’s an internalization of a lot of ideas that I have both learned about from others and worked out for myself. It’s not a prescription for the right way to write code nor is it an admonishment of how anyone else does their work.

StirTrek: Darkness Edition

After having a few very successful posts about the soapUI web service testing tool (see Configure HTTP Basic Auth for soapUI Test Suites and Some Thoughts on Integrating soapUI Functional Tests with Your Build) I’ve decided to speak at a local conference on the same topic. I will be delivering the talk Web Service Testing with soapUI: Write Once, Run Automatically at the StirTrek: Darkness Edition conference in Columbus, OH on May 17th.

If you’re going to be near Columbus on May 17th, it’s worth a day’s time to attend the conference. Tickets go on sale tomorrow, March 14th at 1:59pm EDT.

Here’s the abstract for the talk:

Writing a web service, especially with SOAP, is hard. Trying to test a web service, especially with SOAP, is even harder. Without a good tool, you can only test web services by manually piecing together SOAP messages or tinkering with REST URIs. This presentation will both introduce soapUI, a top-notch REST and SOAP web service test tool, and demonstrate how to automate the tests you write with it. First, you’ll learn the fundamental features that soapUI offers. Next, you’ll learn how to write and run web service tests in soapUI. Finally, you’ll learn how to automate soapUI test execution and integrate tests into an automated build system. Throughout the talk you’ll also see some of the advanced features of soapUI like Groovy and JavaScript validation scripting and how to validate responses with XPath and XQuery expressions.

XSLT in Groovy: Quick and Dirty

As soon as you start writing Extensible Stylesheet Language Transformations (XSLT) you’ll start to wonder if there is an alternative to the coarse language. Even after you’ve written many XSLT files and the sharpness of the angled brackets and other ugly XML syntax has dulled, you may still wish for the relaxed style of indentations and semi-colons in your favorite programming language. A natural question with a popularly sought answer for XSLT programmers is “is there a better syntax to be had that matches the utility of XSLT?”

Do some Google research and you’ll find that this is a common question and one that programmers have answered in ways ranging from disgust to gusto with everything from pointless rants on StackOverflow to constructive solutions in the form of new languages and frameworks. Despite the questioning and the attempts at innovation, XSLT remains a powerful tool for its intended purpose despite its apparent flaws. Groovy, however, does offer some features that replace common XSLT tasks and makes for smoother reading too. This article will demonstrate those features of XSLT that can easily be replaced by Groovy.

Where to Begin?

The Groovy language is not a perfect replacement for XSLT. XSLT is a useful language that is well suited to its specific purpose and, despite the desire of many to replace it, has not been uprooted by any alternatives. So what I’m showing you in this article is not a way to discard XSLT but a way to reduce the amount of XSLT you have to write if you do simple transformations. Use Groovy only as long as it makes your work easier. If you find it difficult to get a transformation to work in Groovy, look again at what XSLT offers. It will probably have a feature that solves your complex problem in a way that Groovy does not.

Admonishment administered, let’s roll this article out as a set of questions and answers. If you’ve done even a few XSLT programs before, you’ll have a grasp on the basics of that language. If you’ve done even a few Groovy programs, you’ll have a grasp on the basics of that language too. And here is the sweet spot for Groovy. It does the common, basic work of an XSLT program but with Groovy’s flair for unceremonious readability and conciseness. The rest of the article unceremoniously lays out the common XSLT tasks to be done and concisely demonstrates how to do them with Groovy.

What is the “XSLT Processor” in Groovy?

XSLT is an interpreted language that requires a processing engine to figure out how to interpret the code and perform the transformation. This processing includes opening the input XML file, performing the encoded transformations and then outputting the result to the console or another file. Listing WAT-1 shows an Ant script that invokes an XSLT transformation using the popular Saxon XSLT processor. There are other ways to call the processor, such as through a command line call, but you are always dealing with your chosen XSLT processor. So where is the XSLT processor in Groovy?

The answer is the Groovy language, mostly, with support from some classes in the standard Groovy API. The front end of the XSLT processor requires using the XMLSlurper and GPathResult classes. Using XMLSlurper to parse an XML file results in a GPathResult object, which is a convenient way of navigating the XML structure. The back end is the MarkupBuilder class, which is designed to make outputting XML and HTML easy on the eye for programmers.

<?xml version="1.0" encoding="UTF-8"?>
    <project name="XSLTFromAntExample">
        <path id="saxon9.classpath" location="D:\\saxon9he.jar"/>

        <target name="transform" depends="clean">
             <xslt destdir="output"
                <outputproperty name="method" value="xml"/>
                <outputproperty name="indent" value="yes"/>
                <fileset dir="input"/>
                <factory name="net.sf.saxon.TransformerFactoryImpl"/>

Listing WAT-1: XSLT transformation called from an Ant script

Why Don’t I Show You An Example?

As an example, Listing WAT-2 provides the steps required to parse the XML file in Listing WAT-3 and obtain a GPathResult object.

def xml = new XmlSlurper().parse(new File("example.xml"))

Listing WAT-2: Obtaining a GPathResult object

<?xml version="1.0" encoding="UTF-8"?>
                    <publisher-name name-type="full">Nick's Academic Journal</publisher-name>
                <pub-date pub-type="ppub">
                <pub-date pub-type="epub">

Listing WAT-3: A sample of an XML file for an academic journal article

Oops, I guess that was just one step. So now you have a representation of the entire XML structure in memory, held in the xml variable. With convenient and powerful GPath expressions you can walk the XML structure as easily as with XPath and use them to perform your transform.

The MarkupBuilder class is the output side of the transform processing in Groovy. With this class you can almost literally write out your output XML with Groovy in between to fill in the dynamic bits. Listing WAT-4 expands on Listing WAT-2 and generates a simple XML structure.

import groovy.xml.MarkupBuilder
def xml = new XmlSlurper().parse(new File("nfjs-example.xml"))
def builder = new MarkupBuilder()
builder.ArticleSet {
  Article {
    Journal {

Listing WAT-4: A simple example of using the MarkupBuilder class

Listing WAT-5 shows the XML that is output from running the code in Listing WAT-4.

<?xml version="1.0" encoding="UTF-8"?>

Listing WAT-5: XML output from the previous listing’s code

I could talk about how this works with closures and all, but that would just confuse the issue. It’s plain to see from Listing WAT-4 both how to build your XML output from the parsed XML and why it is more attractive to look at than XSLT. However, I will explain one thing. Notice that to get the value for the Volume and Issue tags, I had to access methods on the xml object (of type GPathResult). That is a GPath. xml holds the in-memory XML representation parsed from the example.xml file and I use dot notation to “walk” the XML structure. I could get the same value in the XPath expression /article/front/article-meta/volume. The one difference to notice here is that the GPath always begins with a child of the root tag, which is article in this case. That’s why article is not in the GPath expression.

I’ve covered enough ground already that you could go off and do some transformations in Groovy. There are a couple of things to watch out for though, so I’ll cover those before finishing.

Don’t You Have to Replace XPath Too?

That’s true. XPath is a part of XSLT. In response to the desire to handle XML gracefully, Groovy has kindly provided us with the GPathResult class. GPathResult implements the Groovy concept of a GPath, which substitutes XPath with an object-oriented syntax. The basic idea is that an XML document is turned into an object-model dynamically after the XML has been parsed by the XMLSlurper class. This translation has a similar result to the one an Object-Relational Mapping tool has when converting a SQL result set to a hierarchy of objects.

How Do I Get the Value of an Attribute?

Listing WAT-4 covers how to get the value out of any XML tag. Getting the value of an attribute of a tag is just as easy. For example, to get the value of the `name-type` attribute of the `publisher-name` tag from the XML in Listing WAT-3, you would need the simple Groovy

def pubNameType = xml.front."journal-meta".publisher."publisher-name"[@"name-type"]

Listing WAT-6: Obtaining the value of an XML element’s attribute

How Do I Reference an XML Element by Attribute Value?

This is one area where Groovy’s solution is slightly less elegant than the one that XSLT provides. To access an element with a specific attribute value you have to use the find method of the GPathResult class. For example, the XPath expression to access only pub-date tags from the XML in Listing WAT-3 with an attribute of pub-type=”ppub” is /article/front/article-meta/pub-date[@pub-type='ppub']. Groovy requires a syntax that is Groovy-esque but harder to read. Listing WAT-7 shows the Groovy equivalent of the prior XPath expression.

xml.front."article-meta"."pub-date".find {it.@"pub-type" == 'ppub'}

Listing WAT-7: The Groovy way to access XML elements by attribute value

If you’re not as worried about the exact path within the XML, you can use the shorter XPath expression //pub-date[@pub-type='ppub']. The equivalent in Groovy is shown in Listing WAT-8.

xml.**.find {it.@"pub-type" == 'ppub'}

Listing WAT-8: The Groovy way to access XML elements by attribute value with a broader path

The fact that I feel compelled to place the Groovy code from this section in listings while the XPath remains in line with the text proves that XPath wins on conciseness and readability here.

Did You Notice the XML Tags Containing Dashes?

If you happen to have an XML file with tag names containing dashes, you’ll have a little trouble referencing them in your GPath expressions in Groovy. Plainly, an expression like xml.front.article-meta won’t work. As I’ve done in the code in this article, just wrap the dashed tag names in quotes and Groovy will compensate accordingly: xml.front.”article-meta”.

How Do I Make an XSLT Template in Groovy?

Groovy–remember that it is a general purpose language–does not have a direct equivalent to the XSLT template command. Since the XSLT template is one of its strongest and most common features, we should find a good replacement. It turns out that a simple loop can step in for the template. In XSLT, a template is activated whenever its “match” XPath expression returns a result. So you could, for instance, transform all of the <pub-date> tags in the input XML into <PubDate> tags in the output with the XSLT template shown in Listing WAT-9.

<xsl:template match="pub-date">
        <Year><xsl:value-of select="year"/></Year>
        <Month><xsl:value-of select="month"/></Month>
        <Day><xsl:value-of select="day"/></Day>

Listing WAT-9: An example of an XSLT template

Listing WAT-10 adds to the previous Groovy example by adding the <PubDate> tags as children of the <Journal> tag. The resulting XML is provided in Listing WAT-11.

  PubDate {
    xml.front."article-meta"."pub-date".each {

Listing WAT-10: A foreach loop is the replacement for an XSLT template


Listing WAT-11: XML output that includes the `PubDate` tag

Where to End?

We began with some stern finger-shaking about how far you can go when substituting Groovy for XSLT, so let’s end with some more. In general, XSLT provides a thoughtful method of processing XML that is specific to that task. XSLT as a language supports that methodology. On the other hand, Groovy is a general-purpose programming language that has some convenient XML processing features built-in. You cannot always find direct correlations between XSLT features and Groovy. However, if you comprehend each system fundamentally, you can benefit by utilizing the terseness of Groovy to perform some of your XSLT tasks.

Python Code CAN Connect to an Oracle Database.

Despite the beaming praise for the simplicity of the cx_Oracle project on their SourceForge page, I had trouble using the Python module to create a connection to an Oracle database. It turns out that the module is quite nice once you get past a couple of problems that have nothing to do with the cx_Oracle module. In my case, my computer’s environment and some confusing information on the Internet were the cause of my troubles. So here I have organized some hints to help you if you are unfortunate and cannot immediately make a successful connection to your Oracle database using cx_Oracle.

Install the Correct Version of cx_Oracle

This was my biggest problem, though I didn’t realize it for a while. The cx_Oracle project has separate binary distributions for both OS and Oracle version, which support Windows and CentOS for Oracle 10.2, 11.1 and 11.2. On Windows for sure, installing the wrong binary will result in a broken installation. However, it’s not necessarily as simple as knowing which version your Oracle database is at.

Here’s what happened in my case. I have an Oracle ODBC driver installed on my Windows XP installation (OraClient10g) but I want to connect to an 11g database. I first assumed that I needed to install the 11g version of cx_Oracle and found that assumption to be wrong. I think because of the ODBC driver version, I had to install the 10g version of cx_Oracle. I simply could not connect to my database otherwise. I don’t have a way to confirm this ODBC complication because I don’t have authority to install different versions of drivers but it makes sense to me.

The indication I got that my connection was not working was the following error from my Python program:

cx_Oracle.DatabaseError: ORA-24315: illegal attribute type

This is a vague error that, I suppose, has something to do with the mismatch between the cx_Oracle code and the ODBC driver. Once I installed the cx_Oracle version that matched my ODBC driver version I was able to successfully connect.

Connection Strings

When you connect to an Oracle database using the connect method you have several ways to specify important parameters such as user, password and SID (refered to as Data Source Name, DSN, in cx_Oracle). The easiest way to connect is like this:

# Connect using the ordered parameters user, password and SID.
dbconn = cx_Oracle.connect('user', 'password' ,'SID')

You can also be more explicit by naming the parameters like this:

# Connect using named parameters. 
dbconn = cx_Oracle.Connection(user='user',password='password',dsn='SID')

I suggest just using one of the above methods as is used in the sample code that comes with the cx_Oracle module. There is another method though that utilizes the Oracle Easy Connect string. This string is purported by Oracle to be convenient but is not—especially if you have limited authority on the database. The Easy Connect string requires, instead of the easy to obtain and commonly known SID, that you know the service name for a database.  Assuming you have authority to do so, you can execute the command below against your database to obtain the service name:

select sys_context('userenv', 'service_name') from dual;

Once you have the service name, you can connect to the database like so:

# Connect using Oracle's Easy Connect connection string.
 dbconn = cx_Oracle.connect(u'user/password@db-server:1521/')


The cx_Oracle module is a nice library to have around when you’re working with Oracle from Python code. You’ll probably not have the problems I did if you read the cx_Oracle documentation carefully and understand your OS environment properly. However, if you do have some problems, I hope this article helped solve them.

One thing to note is that when you install cx_Oracle it does install documentation. I think it’s in an odd place, but maybe this is common for Python modules. The documentation will be in <python-install-dir>/cx_Oracle-doc. This directory contains documentation, test cases and sample Python code.

Links and References

Here are links to materials that I’ve referenced and other useful links.

cx_Oracle project page –

cx_Oracle download page –

Oracle whitepaper describing the EasyConnect string –

Oracle tutorial on connecting to a database with cx_Oracle –

Some StackOverflow discussions on this topic that helped me:

M3Conf 2012 Slides

If you attended my talk called “Your First Enterprise App – From the Trenches” at M3Conf on October 26th, 2012, you can get the slides here: M3Conf2012-NickWatts

Thanks to everyone who attended for being so engaging. There were many great questions, which I thoroughly enjoyed answering. If you have any questions do not hesitate to comment below or tweet them to me @thewonggei.


Get every new post delivered to your Inbox.

Join 276 other followers