A New Maven Archetype for Starter Project

I’ve recently put together a new Maven archetype, based on something I saw in Freenode’s ##java channel a few weeks ago. Basically, someone had built their own archetype for “standard projects,” with a few sensible dependencies and defaults, and while I thought it was a worthwhile effort, it didn’t fit what I found myself typically doing.

So I built my own, at https://github.com/jottinger/starter-archetype.

Primary features are:

  • Java 8 as a default Java version
  • Typical dependencies
  • Maven Shade

Java 8

Look, Java 7 and older has been end-of-lifed; you can pay Oracle for support and fixes, but you shouldn’t unless you have a real reason to do so. Java 8 is the current Java version. You should be using it, and so should I. Therefore, I do.

It’s an unfortunate aspect of Maven that it defaults to an older version of the Java specification. My starter archetype presumes you want to live in the current year.

Typical Dependencies

My starter archetype has seven dependencies; they are the ones I either include without thinking about it (because I know I’m going to want or need them), or they’re the dependencies that are so common that few projects would blink at their inclusion.

The dependencies are organized into three groups: runtime dependencies, one compile-time dependency, and testing dependencies.

The compile-time dependency is Lombok; it helps remove boilerplate from Java code, so I can build a Java object with mutators, accessors, toString(), hashCode(), and equals() very simply:

@Data
public class Thing {
    // public String getName() and setString(String name) are built for me through Lombok
    String name;
}

The runtime dependencies are Guava, Logback, and Apache’s commons-lang3. Guava and commons-lang3 have a lot of overlap, but both are very common; logback is a logging library that leverages slf4j, so it’s a workable default logging library that doesn’t force you to stick with it if you don’t like it.

All together, they use up roughly 3.5MB of disk space for a starting classpath, with Guava using 2.3MB of it. Given how useful Guava, et al, are, I think this is entirely worthwhile; most projects will have these libraries (or something nearly like them), so it’s acceptable.

The testing libraries are TestNG, assertj, and H2.

It’s arguable that JUnit 5 might have caught up to TestNG in a lot of ways, but there are still some features I really like from TestNG that JUnit doesn’t have built-in support for (namely, data providers – although note that there is a project that provides data provider support for JUnit, unsurprisingly called junit-dataprovider).

AssertJ is a set of fluent assertions for Java. It’s not necessary – for example, I’ve used TestNG’s innate assertions for years without a problem – but the fluent assertion style is rather nice. The actual dependency is assertj-guava – which includes the base library for assertj – but I chose assertj-guava because of the inclusion of Guava as a default runtime dependency.

H2 is an embedded database. I use embedded databases so much for first-level integration tests that it seemed silly not to include it; I have a lot of sandbox projects that don’t use H2, but as soon as I do anything with a database, this gets included, so it makes sense as a trivial “default testing library.”

Maven Shade

I wanted to be able to generate an executable jar by default, not because I do that very much, but because it seemed to be a sane default. (Usually, my starter projects exist to support a test that demonstrates a feature, as opposed to being an independently useful project.)

Because I wanted others to be able to see more use out of my starter project, I added Maven Shade to create a viable entry point and an executable jar.

Using the archetype

Right now, you’d need to run the following sequence at least once to use the archetype:

git clone https://github.com/jottinger/starter-archetype.git
cd starter-archetype
mvn install

Then, to build a project with the archetype, you’d run:

mvn archetype:generate \
    -DarchetypeGroupId=com.autumncode \
    -DarchetypeArtifactId=starter-archetype \
    -DarchetypeVersion=1.0-SNAPSHOT

Future Enhancements

It’s still very much a work in progress. Things I’d like to do:

  • Migrate into the main Maven repositories (publish, in other words)
  • Add publishing support to the archetype itself
  • Include better throwaway demonstrations of the dependencies (a difficult task, as the default classes are meant to be thrown away en masse)
  • Figure out better default libraries, if possible

You’re welcome to fork the project, create issues, comment, use with wild abandon, as you like. It’s licensed under the Apache Source License, 2.0.

A Simple Grappa Tutorial

Grappa is a parser library for Java. It’s a fork of Parboiled, which focuses more on Scala as a development environment; Grappa tries to feel more Java-like than Parboiled does.

Grappa’s similar in focus to other libraries like ANTLR and JavaCC; the main advantage to using something like Grappa instead of ANTLR is in the lack of a processing phase. With ANTLR and JavaCC, you have a grammar file, which then generates a lexer and a parser in Java source code. Then you compile that generated source to get your parser.

Grappa (and Parboiled) represent the grammar in actual source code, so there is no external phase; this makes programming with them feel faster. It’s certainly easier to integrate with tooling, since there is no separate tool to invoke apart from the compiler itself.

I’d like to walk through a simple experience of using Grappa, to perhaps help expose how Grappa works.

The Goal

What I want to do is mirror a tutorial I found for ANTLR, “ANTLR 4: using the lexer, parser and listener with example grammar.” It’s an okay tutorial, but the main thing I thought after reading was: “Hmm, ANTLR’s great, everyone uses it, but let’s see if there are alternatives.”

That led me to Parboiled, but some Parboiled users recommended Grappa for Java, so here we are.

That tutorial basically writes a parser for drink orders. We’ll do more.

Our Bartender

Imagine an automated bartender: “What’re ya havin?”

Well… let’s automate that bartender, such that he can parse responses like “A pint of beer.” We can imagine more variations on this, but we’re going to center on one, until we get near the end of the tutorial: we’d also like to allow our bartender to parse orders from people who’re a bit too inebriated to use the introductory article: “glass of wine” (no a) should also be acceptable.

If you’re interested, the code is on GitHub, in my grappaexample repository.

Let’s take a look at our bartender‘s source code, just to set the stage for our grammar. (Actually, we’ll be writing multiple grammars, because we want to take it in small pieces.)

package com.autumncode.bartender;

import com.github.fge.grappa.Grappa;
import com.github.fge.grappa.run.ListeningParseRunner;
import com.github.fge.grappa.run.ParsingResult;

import java.util.Scanner;

public class Bartender {
    public static void main(String[] args) {
        new Bartender().run();
    }

    public void run() {
        final Scanner scanner = new Scanner(System.in);
        boolean done = false;
        do {
            writePrompt();
            String order = scanner.nextLine();
            done = order == null || handleOrder(order);
        } while (!done);
    }

    private void writePrompt() {
        System.out.print("What're ya havin'? ");
        System.out.flush();
    }

    private boolean handleOrder(String order) {
        DrinkOrderParser parser
                = Grappa.createParser(DrinkOrderParser.class);
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(parser.DRINKORDER());
        ParsingResult<DrinkOrder> result = runner.run(order);
        DrinkOrder drinkOrder;
        boolean done = false;
        if (result.isSuccess()) {
            drinkOrder = result.getTopStackValue();
            done = drinkOrder.isTerminal();
            if (!done) {
                System.out.printf("Here's your %s of %s. Please drink responsibly!%n",
                        drinkOrder.getVessel().toString().toLowerCase(),
                        drinkOrder.getDescription());
            }
        } else {
            System.out.println("I'm sorry, I don't understand. Try again?");
        }
        return done;
    }
}

This isn’t the world’s greatest command line application, but it serves to get the job done. We don’t have to worry about handleOrder yet – we’ll explain it as we go through generating a grammar.

What it does

Grappa describes a grammar as a set of Rules. A rule can describe a match or an action; both matches and actions return boolean values to indicate success. A rule has failed when processing sees false in its stream.

Let’s generate a very small parser for the sake of example. Our first parser (ArticleParser) is going to do nothing other than detect an article – a word like “a”, “an”, or “the.”

Actually, those are all of the articles in English – there are other forms of articles, but English has those three and no others as examples of articles.

The way you interact with a parser is pretty simple. The grammar itself can extend BaseParser<T>, where T represents the output from the parser; you can use Void to indicate that the parser doesn’t have any output internally.

Therefore, our ArticleParser‘s declaration will be:

public class ArticleParser extends BaseParser<Void> {

We need to add a Rule to our parser, so that we can define an entry point from which the parser should begin. As a first stab, we’ll create a Rule called article(), that tries to match one of our words. With a trie. It’s cute that way.

A trie is a type of radix tree. They tend to be super-fast at certain kinds of classifications. Note that this method name may change in later versions of Grappa, because honestly, the actual search mechanism – the trie – isn’t important for the purpose of invoking the method.

public class ArticleParser extends BaseParser<Void> {
    public Rule article() {
        return trieIgnoreCase("a", "an", "the");
    }
}

This rule should match any variant of “a”, “A”, “an”, “tHe”, or anything like that – while not matching any text that doesn’t somehow fit in as an article. Let’s write a test that demonstrates this, using TestNG so we can use data providers:

package com.autumncode.bartender;

import com.github.fge.grappa.Grappa;
import com.github.fge.grappa.run.ListeningParseRunner;
import com.github.fge.grappa.run.ParsingResult;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import static org.testng.Assert.assertEquals;

public class ArticleTest {
    @DataProvider
    Object[][] articleData() {
        return new Object[][]{
                {"a", true},
                {"an", true},
                {"the", true},
                {"me", false},
                {"THE", true},
                {" a", false},
                {"a ", true},
                {"afoo", true},
        };
    }

    @Test(dataProvider = "articleData")
    public void testOnlyArticle(String article, boolean status) {
        ArticleParser parser = Grappa.createParser(ArticleParser.class);
        testArticleGrammar(article, status, parser.article());
    }
    
    private void testArticleGrammar(String article, boolean status, Rule rule) {
        ListeningParseRunner<Void> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<Void> articleResult = runner.run(article);
        assertEquals(articleResult.isSuccess(), status,
                "failed check on " + article + ", parse result was "
                        + articleResult + " and expected " + status);
    }
}

So what is happening here?

First, we create a global (for the test) ArticleParser instance through Grappa. Then we create a ListeningParseRunner, with the entry point to the grammar as a parameter; this builds the internal model for the parser (stuff we don’t really care about, but it is memoized, so we can use that code over and over again without incurring the time it takes for processing the grammar at runtime.)

I used a utility method because the form of the tests themselves doesn’t change – only the inputs and the rules being applied. As we add more to our grammar, this will allow us to run similar tests with different inputs, results, and rules.

After we’ve constructed our Parser’s Runner, we do something completely surprising: we run it, with ParsingResult<Void> articleResult = runner.run(article);. Adding in the TestNG data provider, this means we’re calling our parser with every one of those articles as a test, and checking to see if the parser’s validity – shown by articleResult.isSuccess() – matches what we expect.

In most cases, it’s pretty straightforward, since we are indeed passing in valid articles. Where we’re not, the parser says that it’s not a successful parse, such as when we pass it me.

There are three cases where the result might be surprising: " a", "a ", and "afoo". The whitespace is significant, for the parser; for our test, the article with a trailing space passes validation, as does “afoo“, while the article with the leading space does not.

The leading space is easy: our parser doesn’t consume any whitespace, and Grappa assumes whitespace is significant unless told otherwise (by Rules, of course.) So that space doesn’t match our article; it fails to parse, because of that.

However, the trailing space (and "afoo") is a little more odd. What’s happening there is that Grappa is parsing as much of the input as is necessary to fulfill the grammar; once it’s finished doing that, it doesn’t care about anything that follows the grammar. So once it matches the initial text – the “a” – it doesn’t care what the rest of the content is. It’s not significant that "foo" follows the “a“; it matches the “a” and it’s done.

We can fix that, of course, by specifying a better rule – one that includes a terminal condition. That introduces a core concept for Grappa, the “sequence().” (This will factor very heavily into our grammar when we add the ability to say “please” at the end of the tutorial.)

Author’s note: I use “terminal” to mean “ending.” So a terminal anything is meant to indicate finality. However, a “terminal” is also used to describe something that doesn’t delegate to anything else, in Grappa’s terms. So for Grappa, the use of “terminal” might not be the same as my use of the word “terminal”.

Let’s expand our ArticleParser a little more. Now it looks like:

package com.autumncode.bartender;

import com.github.fge.grappa.parsers.BaseParser;
import com.github.fge.grappa.rules.Rule;

public class ArticleParser extends BaseParser<Void> {
    public Rule article() {
        return trieIgnoreCase("a", "an", "the");
    }

    public Rule articleTerminal() {
        return sequence(
                article(),
                EOI
        );
    }
}

What we’ve done is added a new Rule – articleTerminal() – which contains a sequence. That sequence is “an article” — which consumes “a,” “an”, or “the” – and then the special EOI rule, which stands for “end of input.” That means that our simple article grammar won’t consume leading or trailing spaces – the grammar will fail if any content exists besides our article.

We can show that with a new test:

@DataProvider
Object[][] articleTerminalData() {
    return new Object[][]{
         {"a", true},
         {"an", true},
         {"the", true},
         {"me", false},
         {"THE", true},
         {" a", false},
         {"a ", false},
         {"afoo", false},
    };
}

@Test(dataProvider = "articleTerminalData")
public void testArticleTerminal(String article, boolean status) {
    ArticleParser parser = Grappa.createParser(ArticleParser.class);
    testArticleGrammar(article, status, parser.articleTerminal());
}

Now our test performs as we’d expect: it matches the article, and only the article – as soon as it has a single article, it expects the end of input. If it doesn’t find that sequence exactly, it fails to match and isSuccess() returns false.

It’s not really very kind for us to not accept whitespace, though: we probably want to parse " a " as a valid article, but not " a the" or anything like that.

It shouldn’t be very surprising that we can use sequence() for that, too, along with a few new rules from Grappa itself. Here’s our Rule for articles with surrounding whitespace:

public Rule articleWithWhitespace() {
    return sequence(
            zeroOrMore(wsp()),
            article(),
            zeroOrMore(wsp()),
            EOI
    );
}

What we’ve done is added two extra parsing rules, around our article() rule: zeroOrMore(wsp()). The wsp() rule matches whitespace – spaces and tabs, for example. The zeroOrMore() rule seems faintly self-explanatory, but just in case: it says “this rule will match if zero or more of the contained rules match.”

Therefore, our new rule will match however much whitespace we have before an article, then the article, and then any whitespace after the article – but nothing else. That’s fun to say, I guess, but it’s a lot more fun to show:

 @DataProvider
 Object[][] articleWithWhitespaceData() {
     return new Object[][]{
             {"a", true},
             {"a      ", true},
             {"     the", true},
             {"me", false},
             {" THE ", true},
             {" a an the ", false},
             {"afoo", false},
     };
 }

 @Test(dataProvider = "articleWithWhitespaceData")
 public void testArticleWithWhitespace(String article, boolean status) {
     ArticleParser parser = Grappa.createParser(ArticleParser.class);
     testArticleGrammar(article, status, parser.articleWithWhitespace());
 }

Believe it or not, we’re actually most of the way to being able to build our full drink order parser – we need to figure out how to get data from our parser (hint: it’s related to that <Void> in the parser’s declaration), but that’s actually the greatest burden we have remaining.

One other thing that’s worth noting as we go: our code so far actually runs twenty-three tests. On my development platform, it takes 64 milliseconds to run all twenty-three – the first one takes 49, where it’s building the parser for the first time. The rest take somewhere between 0 and 4 milliseconds – and I’m pretty sure that 4ms reading is an outlier. Our grammar isn’t complex, and I imagine we could write something without a grammar that would be faster – maybe HashSet<String>.contains(input.trim()) – but we’re about to step into territory that would end up being a lot less maintainable as our grammar grows.

I ran the tests one hundred times each and the same pattern showed up: every now and then you’d see a test that ran slower. My initial guess is that this is related to garbage collection or some other housekeeping chore on the JVM’s part, but I haven’t verified it.)

Getting Data out of our Parser

Grappa uses an internal stack of values to track and expose information. We can tell it the type of the stack values – and in fact, we already did so in our ArticleParser. It’s the <Void> we used – that says that we have a stack of Void values, which is a cute way of saying “no value at all.” (If you remember carefully, we pointed that out when we first started describing the ArticleParser – this is where that information is useful!)

Therefore, all we need to do is expose a type, and then manipulate that stack of values. We do so with a special type of Rule, a function that returns a boolean that indicates whether the Rule was successful.

Our goal with the article is to parse drink orders, of the general form of “a VESSEL of DRINK.” We already worked on a parser that demonstrates parsing the “a” there – it’s time to think about parsing the next term, which we’ll call a “vessel.” Or, since we’re using Java, a Vessel – which we’ll encapsulate in an Enum so we can easily add Vessels.

The Vessel itself is pretty simple:

package com.autumncode.bartender;

public enum Vessel {
    PINT,
    BOWL,
    GLASS,
    CUP,
    PITCHER,
    MAGNUM,
    BOTTLE,
    SPOON
}

What we want to do is create a parser such that we can hand it “a glass” and get Vessel.GLASS out of it.

Given that we’ve said that a parser can be constructed with the “return type”, that tells us that our VesselParser wants to extend BaseParser<Vessel>, and so it does. In fact, our VesselParser isn’t even very surprising, given what we’ve learned from our ArticleParser:

public class VesselParser extends BaseParser<Vessel> {
    static final Collection<String> vessels = Stream
            .of(Vessel.values())
            .map(Enum::name)
            .collect(Collectors.toList());

    public Rule vessel() {
        return trieIgnoreCase(vessels);
    }
}

What does this do? Well, most of it is building a List of the Vessel values, by extracting the values from Vessel. It’s marked static final so it will only initialize that List once; the Rule (vessel()) simply uses the exact same technique we used in parsing articles. It doesn’t actually do anything with the match, though. It would simply fail if it was handed text that did not match a Vessel type.

Incidentally, the Java Language Specification suggests the order of static final, in section 8.3.1, Field Modifiers.

Let’s try it out, using the same sort of generalized pattern we saw in our ArticleParser tests. (We’re going to add a new generalized test method, when we add in the type that should be returned, but this will do for now.)

public class VesselTest {
    private void testGrammar(String corpus, boolean status, Rule rule) {
        ListeningParseRunner<Vessel> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<Vessel> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
    }
    
    @DataProvider
    Object[][] simpleVesselParseData() {
        return new Object[][]{
                {Vessel.PINT.name(), true,},
                {Vessel.BOWL.name(), true,},
                {Vessel.GLASS.name(), true,},
                {Vessel.CUP.name(), true,},
                {Vessel.PITCHER.name(), true,},
                {Vessel.MAGNUM.name(), true,},
                {Vessel.BOTTLE.name(), true,},
                {Vessel.SPOON.name(), true,},
                {"hatful", false,},
        };
    }

    @Test(dataProvider = "simpleVesselParseData")
    public void testSimpleVesselParse(String corpus, boolean valid) {
        VesselParser parser = Grappa.createParser(VesselParser.class);
        testGrammar(corpus, valid, parser.vessel());
    }
}

The idiom that Grappa uses – and that I will use, in any event – involves the use of the push() and match() methods.

Basically, when we match a Vessel – using that handy vessel() rule – what we will do is push() a value corresponding to the Vessel whose name corresponds to the Rule we just wrote. We can get the text of the Rule we just matched, with the rather-handily-named match() method.

It’s actually simpler to program than it is to describe:

 // in VesselParser.java
 public Rule VESSEL() {
     return sequence(
             vessel(), 
             push(Vessel.valueOf(match().toUpperCase()))
     );
 }

This is a rule that encapsulates the matching of the vessel name – thus, vessel() – and then, assuming the match is found, calls push() with the Vessel whose text is held in match().

That’s fine to say, but much better to show. Here’s a test of our VESSEL() rule, following the same sort of generalized pattern we saw for parsing articles, along with a new generalized test runner that examines the returned value if the input data is valid according to the grammar:

private void testGrammarResult(String corpus, boolean status, Vessel value, Rule rule) {
    ListeningParseRunner<Vessel> runner
            = new ListeningParseRunner<>(rule);
    ParsingResult<Vessel> result = runner.run(corpus);
    assertEquals(result.isSuccess(), status,
            "failed check on " + corpus + ", parse result was "
                    + result + " and expected " + status);
    if(result.isSuccess()) {
        assertEquals(result.getTopStackValue(), value);
    }
}

@DataProvider
Object[][] simpleVesselReturnData() {
    return new Object[][]{
            {Vessel.PINT.name(), true, Vessel.PINT},
            {Vessel.BOWL.name(), true, Vessel.BOWL},
            {Vessel.GLASS.name(), true, Vessel.GLASS},
            {Vessel.CUP.name(), true, Vessel.CUP},
            {Vessel.PITCHER.name(), true, Vessel.PITCHER},
            {Vessel.MAGNUM.name(), true, Vessel.MAGNUM},
            {Vessel.BOTTLE.name(), true, Vessel.BOTTLE},
            {Vessel.SPOON.name(), true, Vessel.SPOON},
            {"hatful", false, null},
    };
}

@Test(dataProvider = "simpleVesselReturnData")
public void testSimpleVesselResult(String corpus, boolean valid, Vessel value) {
    VesselParser parser = Grappa.createParser(VesselParser.class);
    testGrammarResult(corpus, valid, value, parser.VESSEL());
}

Note that we’re testing with a Rule of parser.VESSEL() – the one that simply matches a vessel name is named parser.vessel(), and the one that updates the parser’s value stack is parser.VESSEL().

This is a personal idiom. I reserve the right to change my mind if sanity demands it. In fact, I predict that I will have done just this by the end of this article.

So what this does is very similar to our prior test – except it also tests the value on the parser’s stack (accessed via
result.getTopStackValue()) against the value that our DataProvider says should be returned, as long as the parse was expected to be valid.

All this is well and good – we can hand it "glass" and get Vessel.GLASS — but we haven’t fulfilled everything we want out of a VesselParser. We want to be able to ask for " a pint " — note the whitespace! — and get Vessel.PINT. We need to add in our article parsing.

First, let’s write our tests, so we know when we’re done:

@DataProvider
Object[][] articleVesselReturnData() {
    return new Object[][]{
            {"a pint", true, Vessel.PINT},
            {"the bowl", true, Vessel.BOWL},
            {"  an GLASS", true, Vessel.GLASS},
            {"a     cup", true, Vessel.CUP},
            {"the pitcher    ", true, Vessel.PITCHER},
            {" a an magnum", false, null},
            {"bottle", true, Vessel.BOTTLE},
            {"spoon   ", true, Vessel.SPOON},
            {"spoon  bottle ", false, null},
            {"hatful", false, null},
            {"the stein", false, null},
    };
}

@Test(dataProvider = "articleVesselReturnData")
public void testArticleVesselResult(String corpus, boolean valid, Vessel value) {
    VesselParser parser = Grappa.createParser(VesselParser.class);
    testGrammarResult(corpus, valid, value, parser.ARTICLEVESSEL());
}

Our tests should be able to ignore the leading article and any whitespace. Any wrongful formation (as you see in " a an magnum") should fail, and any vessel type that isn’t valid ("hatful" and "the stein") should fail.

Our Rule is going to look like a monster, because it has to handle a set of possibilities, but it’s actually pretty simple. Let’s take a look, then walk through the grammar:

public Rule article() {
    return trieIgnoreCase("a", "an", "the");
}

public Rule ARTICLEVESSEL() {
    return sequence(
            zeroOrMore(wsp()),
            optional(
                    sequence(
                            article(),
                            oneOrMore(wsp())
                    )),
            VESSEL(),
            zeroOrMore(wsp()),
            EOI);
}

First, we added our article() Rule, from our ArticleParser. It might be tempting to copy all the whitespace handling from that parser as well, but we shouldn’t – all we care about is the articles themselves (“lexemes,” if we’re trying to look all nerdy.)

It’s the ARTICLEVESSEL() Rule that’s fascinating. What that is describing is a sequence, consisting of:

  • Perhaps some whitespace, expressed as zeroOrMore(wsp()).
  • An optional sequence, consisting of:
    • An article.
    • At least one whitespace character.
  • A vessel (which, since we’re using VESSEL(), means the parser’s stack is updated.)
  • Perhaps some whitespace.
  • The end of input.

Any input that doesn’t follow that exact sequence ("spoon bottle", for example) fails.

Believe it or not, we’re now very much on the downhill slide for our bar-tending program.

We need to add a preposition (“of”) and then generalized text handling for the type of drink, and we need to add the container type – but of this, only the type of drink will add any actual complexity to our parser.

Rounding out the Bartender

Our VesselParser is actually a pretty good model for the DrinkOrderParser that our Bartender will use. What we need to add is matching for two extra tokens: “of,” as mentioned, and then a generalized description of a drink.

We’re not going to be picky about the description; we could validate it (just like we’ve done for Vessel) but there are actual better lessons to be gained by leaving it free-form.

Let’s take a look at the operative part of Bartender again, which will set the stage for the full parser.

DrinkOrderParser parser
        = Grappa.createParser(DrinkOrderParser.class);
ListeningParseRunner<DrinkOrder> runner
        = new ListeningParseRunner<>(parser.DRINKORDER());
ParsingResult<DrinkOrder> result = runner.run(order);
DrinkOrder drinkOrder;
boolean done = false;
if (result.isSuccess()) {
    drinkOrder = result.getTopStackValue();
    done = drinkOrder.isTerminal();
    if (!done) {
        System.out.printf("Here's your %s of %s. Please drink responsibly!%n",
                drinkOrder.getVessel().toString().toLowerCase(),
                drinkOrder.getDescription());
    }
} else {
    System.out.println("I'm sorry, I don't understand. Try again?");
}
return done;

The very first thing we’re going to do is create a DrinkOrder class, that contains the information about our drink order.

public class DrinkOrder {
    Vessel vessel;
    String description;
    boolean terminal;
}

I’m actually using Lombok in the project (and the @Data annotation) but for the sake of example, imagine that we have the standard boilerplate accessors and mutators for each of those attributes. Thus, we can call setDescription(), et al, even though we’re not showing that code. We’re also going to have equals() and hashCode() created (via Lombok), as well as a no-argument constructor and another constructor for all properties.

In other words, it’s a fairly standard Javabean, but we’re not showing all of the boilerplate code – and thanks to Lombok, we don’t even need the boilerplate code. Lombok makes it for us.

If you do need the code for equals(), hashCode(), toString(), or the mutators, accessors, and constructors shown, you may be reading the wrong tutorial. How did you make it this far?

Before we dig into the parser – which has only one really interesting addition to the things we’ve seen so far – let’s take a look at our test. This is the full test, so it’s longer than some of our code has been. The DrinkOrderParser will be much longer.

public class DrinkOrderParserTest {
    private void testGrammarResult(String corpus, boolean status, DrinkOrder value, Rule rule) {
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<DrinkOrder> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
        if (result.isSuccess()) {
            assertEquals(result.getTopStackValue(), value);
        }
    }
    
    @DataProvider
    public Object[][] drinkOrderProvider() {
        return new Object[][]{
                {"a glass of water", true, new DrinkOrder(Vessel.GLASS, "water", false)},
                {"a pitcher of old 66", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a    pint  of duck   vomit   ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a shoeful of motor oil", false, null},
                {"nothing", true, new DrinkOrder(null, null, true)},
        };
    }
    
    @Test(dataProvider = "drinkOrderProvider")
    public void testDrinkOrderParser(String corpus, boolean valid, DrinkOrder result) {
        DrinkOrderParser parser = Grappa.createParser(DrinkOrderParser.class);
        testGrammarResult(corpus, valid, result, parser.DRINKORDER());
    }
}

Most of this should be fairly simple; it’s the same pattern we’ve seen used in our other tests.

I don’t actually drink, myself, so… I keep imagining some biker bar in the American southwest selling a beer called “Old 66,” and in my imagination “duck vomit” is the kind of wine that comes in a resealable plastic bag.

A lot of the DrinkOrderParser will be very familiar. Let’s dive in and take a look at all of it and then break it down:

public class DrinkOrderParser extends BaseParser<DrinkOrder> {
    Collection<String> vessels = Stream
            .of(Vessel.values())
            .map(Enum::name)
            .collect(Collectors.toList());

    public boolean assignDrink() {
        peek().setDescription(match().toLowerCase().replaceAll("\\s+", " "));
        return true;
    }

    public boolean assignVessel() {
        peek().setVessel(Vessel.valueOf(match().toUpperCase()));
        return true;
    }

    public boolean setTerminal() {
        peek().setTerminal(true);
        return true;
    }

    public Rule ARTICLE() {
        return trieIgnoreCase("a", "an", "the");
    }

    public Rule OF() {
        return ignoreCase("of");
    }

    public Rule NOTHING() {
        return sequence(
                trieIgnoreCase("nothing", "nada", "zilch", "done"),
                EOI,
                setTerminal()
        );
    }

    public Rule VESSEL() {
        return sequence(
                trieIgnoreCase(vessels),
                assignVessel()
        );
    }

    public Rule DRINK() {
        return sequence(
                join(oneOrMore(firstOf(alpha(), digit())))
                        .using(oneOrMore(wsp()))
                        .min(1),
                assignDrink()
        );
    }

    public Rule DRINKORDER() {
        return sequence(
                push(new DrinkOrder()),
                zeroOrMore(wsp()),
                firstOf(
                        NOTHING(),
                        sequence(
                                optional
                                        ARTICLE(),
                                        oneOrMore(wsp())
                                ),
                                VESSEL(),
                                oneOrMore(wsp()),
                                OF(),
                                oneOrMore(wsp()),
                                DRINK()
                        )
                ),
                zeroOrMore(wsp()),
                EOI
        );
    }
}

We’re reusing the mechanism for creating a collection of Vessel references. We’re also repeating the Rule used to detect an article.

We’re adding a Rule for the detection of the preposition “of”, which is a mandatory element in our grammar. We use ignoreCase(), because we respect the rights of drunkards to shout at their barkeeps:

public Rule OF() {
    return ignoreCase("of");
}

Note how I’m skirting my own rule about naming. I said I was reserving the right to change my mind, and apparently I’ve done so even while writing this article. According to the naming convention I described earlier, it should be of() and not OF() because it doesn’t alter the parser’s stack. The same rule applies to ARTICLE(). It’s my content, I’ll write it how I want to unless I decide to fix it later.

I’m also creating methods to mutate the parser state:

protected boolean assignDrink() {
    peek().setDescription(match().toLowerCase().replaceAll("\\s+", " "));
    return true;
}

protected boolean assignVessel() {
    peek().setVessel(Vessel.valueOf(match().toUpperCase()));
    return true;
}

protected boolean setTerminal() {
    peek().setTerminal(true);
    return true;
}

These are a little interesting, in that they use peek(). The actual base rule in our grammar is DRINKORDER(), which immediately pushes a DrinkOrder reference onto the parser stack. That means that there is a DrinkOrder that other rules can modify at will; peek() gives us that reference. Since it’s typed via Java’s generics, we can call any method that DrinkOrder exposes.

These utilities all return true. None of them can fail, because they won’t be called unless a prior rule has matched; these methods are for convenience only. Actually, let’s show the NOTHING() and VESSEL() rules, so we can see how these methods are invoked:

public Rule NOTHING() {
    return sequence(
            trieIgnoreCase("nothing", "nada", "zilch", "done"),
            EOI,
            setTerminal(),
    );
}

public Rule VESSEL() {
    return sequence(
            trieIgnoreCase(vessels),
            assignVessel()
    );
}

This leaves two new rules to explain: DRINK() and DRINKORDER(). Here’s DRINK():

public Rule DRINK() {
    return sequence(
            join(oneOrMore(firstOf(alpha(), digit())))
                    .using(oneOrMore(wsp()))
                    .min(1),
            assignDrink()
    );
}

This rule basically builds a list of words. It’s a sequence of operations; the first builds the match of the words, and the second operation assigns the matched content to the DrinkOrder‘s description.

The match of the words is really just a sequence of alphanumeric characters. It requires at least one such sequence to exist, but will consume as many as there are in the input.

Now for the Rule that does most of the work: DRINKORDER().

public Rule DRINKORDER() {
    return sequence(
            push(new DrinkOrder()),
            zeroOrMore(wsp()),
            firstOf(
                    NOTHING(),
                    sequence(
                            optional(sequence(
                                    ARTICLE(),
                                    oneOrMore(wsp())
                            )),
                            VESSEL(),
                            oneOrMore(wsp()),
                            OF(),
                            oneOrMore(wsp()),
                            DRINK()
                    )
            ),
            zeroOrMore(wsp()),
            EOI
    );
}

Again, we have a sequence. It works something like this:

  • First, push a new DrinkOrder onto the stack, to keep track of our order’s state.
  • Consume any leading whitespace.
  • Either:
    • Check for the terminal condition (“nothing”, for example), or
    • Check for a new sequence, of the following form:
      • An optional sequence:
        • An article
        • Any trailing whitespace after the article
      • A vessel
      • One or more whitespace characters
      • The lexeme matching “of”
      • One or more whitespace characters
      • The drink description
  • Any trailing whitespace
  • The end of input

We’ve basically built most of this through our parsers, bit by bit; armed with the ability to peek() and push(), we can build some incredibly flexible parsers with fairly simple code.

Adding Politesse

All this has been great, so far. We can actually “order” from a Bartender, giving us this scintillating conversation:

$ java -jar bartender-1.0-SNAPSHOT.jar
What're ya havin'? a glass of water
Here's your glass of water. Please drink responsibly!
What're ya havin'? a toeful of shoe polish
I'm sorry, I don't understand. Try again?
What're ya havin'? a pint of indigo ink
Here's your pint of indigo ink. Please drink responsibly!
What're ya havin'? nothing
$

The only problem is that it’s not very humane or polite. We can’t say “please,” we can’t be very flexible. What we need is to add politesse to our grammar.

What we really want to do is modify our DrinkOrderParser so that we can ask for “a cup of pinot noir, 1986 vintage, please?” It should be able to tell us that we’ve ordered “pinot noir, 1986” and not “pinot noir, 1986, please?” — that’d be silly.

However, we need to alter our grammar in some core ways – particularly in how we match the drink names — and use a new Rule, testNot. First, though, let’s take a look at our test code, because that’s going to give us a workable indicator of whether we’ve succeeded or not.

public class PoliteDrinkOrderParserTest {
    private void testGrammarResult(String corpus, boolean status, DrinkOrder value, Rule rule) {
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<DrinkOrder> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
        if (result.isSuccess()) {
            assertEquals(result.getTopStackValue(), value);
        }
    }

    @DataProvider
    public Object[][] drinkOrderProvider() {
        return new Object[][]{
                {"a glass of water please", true, new DrinkOrder(Vessel.GLASS, "water", false)},
                {"a pitcher of old 66, please", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a pitcher of old 66", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a glass of pinot noir, 1986", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"a glass of pinot noir, 1986, ok?", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"glass of pinot noir, 1986, ok?", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"cup , pinot noir, 1986 vintage, ok?", true, new DrinkOrder(Vessel.CUP, "pinot noir, 1986 vintage", false)},
                {"cup,pinot noir, 1986,ok!", true, new DrinkOrder(Vessel.CUP, "pinot noir, 1986", false)},
                {"a    pint  of duck   vomit   ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a    pint  of duck   vomit  , please ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {" pint , duck   vomit please  ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a shoeful of motor oil", false, null},
                {"nothing", true, new DrinkOrder(null, null, true)},
        };
    }

    @Test(dataProvider = "drinkOrderProvider")
    public void testDrinkOrderParser(String corpus, boolean valid, DrinkOrder result) {
        PoliteDrinkOrderParser parser = Grappa.createParser(PoliteDrinkOrderParser.class);
        testGrammarResult(corpus, valid, result, parser.ORDER());
    }
}

You’ll notice that we’ve changed some other things, too. Our original grammar was pretty simple in formal-ish terms:

DRINKORDER ::= nothing | article? vessel `of` drink
article ::= a | an | the
vessel ::= pint | bowl | glass | cup | pitcher | magnum | bottle | spoon
drink ::= [a-zA-Z0-9]*
nothing ::= nothing | nada | zilch | done

Note that this isn’t an actual formal grammar – I’m cheating. It just looks as if it might be something near formal, with a particular failure in the “drink” term.

Our new one seems to be more flexible:

DRINKORDER ::= (nothing | article? vessel of drink) interjection? eos?
article ::= a | an | the
vessel ::= pint | bowl | glass | cup | pitcher | magnum | bottle | spoon
of ::= , | of
drink ::= !interjection
interjection ::= ','? please | ok | okay | pls | yo 
eos ::= '.' | '!' | '?'
nothing ::= nothing | nada | zilch | done

Here, we’re no more actually formal than we were – the “!interjection” is trying to say that a drink is everything where a drink would be appropriate, up to the interjection.

I don’t care for Backus-Naur form, and I’m using something that looks like it because I thought it might help. Your mileage may vary as to whether I was correct or not.

At any rate, our new grammar should allow us to say “please” and eliminate the unnecessary “of” – although I’m not willing to concede that a bartender should respond well to “pint beer.” “pint, beer.” I can accept – but that comma is significant, by golly.

I’ll leave it as an exercise for the reader to make the comma not necessary – and to write the test that proves it.

However, one thing remains: we haven’t seen our grammar. Most of it’s the same: the article, the vessel, and the action rules (the things that construct our returned drink order) haven’t changed, but we have a slew of new rules (for the end of the sentence and the interjection) and we’ve modified some old ones (drink, and of). Let’s take a look at the changes held in PoliteDrinkOrderParser:

public Rule OF() {
    return firstOf(
            sequence(
                    zeroOrMore(wsp()),
                    COMMA(),
                    zeroOrMore(wsp())
            ),
            sequence(
                    oneOrMore(wsp()),
                    ignoreCase("of"),
                    oneOrMore(wsp())
            )
    );
}

public Rule DRINK() {
    return sequence(
            oneOrMore(
                    testNot(INTERJECTION()),
                    ANY
            ),
            assignDrink());
}

public Rule DRINKORDER() {
    return sequence(
            optional(sequence(
                    ARTICLE(),
                    oneOrMore(wsp())
            )),
            VESSEL(),
            OF(),
            DRINK()
    );
}

public Rule COMMA() {
    return ch(',');
}

public Rule INTERJECTION() {
    return sequence(
            zeroOrMore(wsp()),
            optional(COMMA()),
            zeroOrMore(wsp()),
            trieIgnoreCase("please", "pls", "okay", "yo", "ok"),
            TERMINAL()
    );
}

public Rule EOS() {
    return anyOf(".!?");
}

public Rule TERMINAL() {
    return sequence(zeroOrMore(wsp()),
            optional(EOS()),
            zeroOrMore(wsp()),
            EOI
    );
}

public Rule ORDER() {
    return sequence(
            push(new DrinkOrder()),
            zeroOrMore(wsp()),
            firstOf(DRINKORDER(), NOTHING()),
            optional(INTERJECTION()),
            TERMINAL()
    );
}

We’ve had to move whitespace handling around a little, too, because of the use of OF() to serve as a connector rather than the simple word “of.”

OF() now has to serve as a syntax rule for a single comma – with no whitespace – as you’d see in the string “pint,beer“. It also has to handle whitespace – as you’d find in pint , beer.

However, it needs to mandate whitespace for the actual word of – because pintofbeer doesn’t work.

Another exercise for the reader: fix OF() to handle “pint, of beer“.

DRINK() has a new sequenceoneOrMore(testNot(INTERJECTION()), ANY).

Pay attention to this.

This means to match everything (as per the ANY) that does not match the INTERJECTION() rule. The sequence order is important – it tries to match the rules in order, so it checks the tokens (by looking ahead) against INTERJECTION() first, and failing that check (and therefore succeeding in the match – remember, we’re looking for something that is not an INTERJECTION()) it checks to see if the text matches ANY.

Given that ANY matches anything, it succeeds – as long as the tokens are not tokens that match the INTERJECTION() rule.

And what does INTERJECTION() look like? Well, it’s a normal rule – this is where Grappa really shines. Our INTERJECTION() has optional whitespace and punctuation, and it’s own case insensitive matching:

public Rule INTERJECTION() {
    return sequence(
            zeroOrMore(wsp()),
            optional(COMMA()),
            zeroOrMore(wsp()),
            trieIgnoreCase("please", "pls", "okay", "yo", "ok"),
            TERMINAL()
    );
}

It also has the terminal condition for the order, because something might look like an interjection but wouldn’t be. Consider this input: “glass,water,please fine.” The ,please matches an INTERJECTION(), but because the INTERJECTION() includes the TERMINAL() rule – which means “optional whitespace, an optional end-of-sentence, optional whitespace, and then a definite end-of-input” – “,please fine” fails the INTERJECTION() match, and falls back to ANY.

EOI can match legally several times. That’s why we can match it in our INTERJECTION() rule while still having it match the end of our ORDER() rule. The nature of TERMINAL() – being a series of optional elements – means that if it’s matched as part of INTERJECTION() it won’t match at the end of ORDER(). Such is life.

We can also order something like this: “glass,water, please ok?” — and our drink would be a glass of “water, please” because “ok” would match the INTERJECTION() rule.

Our bartender’s a great guy, but he likes his grammar.

Our PoliteBartender class is different from our Bartender only in the Parser it uses and the originating Rule – and, of course, in the flexibility of the orders it accepts.

$ java -cp bartender-1.0-SNAPSHOT.jar com.autumncode.bartender.PoliteBartender
What're ya havin'? a glass of water
Here's your glass of water. Please drink responsibly!
What're ya havin'? a toeful of shoe polish
I'm sorry, I don't understand. Try again?
What're ya havin'? a pint of indigo ink, please
Here's your pint of indigo ink. Please drink responsibly!
What're ya havin'? A SPOON OF DOM PERIGNON, 1986, OK?
Here's your spoon of dom perignon, 1986. Please drink responsibly!
What're ya havin'? magnum,water,pls, please
Here's your magnum of water,pls. Please drink responsibly!
What're ya havin'? nothing
$

Colophon

By the way, much appreciation goes to the following individuals, who helped me write this in various important ways, and in no particular order:

  • Francis Galiegue, who helped by reviewing the text, by pointing out various errors in my grammars, and by writing Grappa in the first place
  • Chris Brenton, who reviewed (a lot!) and helped me tune the messaging
  • Andreas Kirschbaum, who also reviewed quite a bit for the article, especially in early form

Teams

One of the things I’ve been carrying around with me for years is the concept I call “teams” — like, how I more or less rank my peers. It’s sort of an evaluation tool.

I don’t tell people where they are in my rankings, because some people might feel that they should be ranked higher than they are, and I don’t want to hurt peoples’ feelings. It’s really a way of gauging how I feel about people, anyway… some of the rankings aren’t based on actual technical skills, but on personality clashes. That’s okay, you know? Someone might be a perfect fit technically speaking, but just be too arrogant to work with comfortably.

I’m mostly writing it up because I think it might be interesting to others to read about and think about.

Five teams

The way I see it, there are five classifications of people in the field.

The Stars

These are the “names.” They’re people whose skills are beyond question, and if they called me and said, “Hey, Joe, where are you going with that…” wait, wrong quote. If they called and said, “Hey, Joe, come join my team, we need a janitor!” I’d be saying, “Yes, sir, on my way, sir, do you want me to mop clockwise or counterclockwise, sir?”

This group’s pretty exclusive. Peter van der Linden and Magnus Stenman are in this group, and it’s not easy to get here. There are a lot of names that you’d expect to be here… that just aren’t. (Maybe it’s personality… but usually it’s “That guy is amazing, but he just doesn’t have that IT FACTOR for me.”)

If one of these guys asked a stupid question, I’d be pretty sure they were trolling me. Alternatively, I’m so stupid by comparison to them that I don’t even understand the question.

The A-Team

These are those people with whom I’d love to work on a daily basis. We’re peers, we are able to get along, fights are seen in their natural and right context (i.e., disagreements, we work them out and move on), we’re friends although perhaps we don’t necessarily socialize together, we’re mature.

If people on this list are looking for work, well, that highlights the stupidity of the industry – these people should be snapped up in minutes if they’re available. (The only reasons they wouldn’t be are based on bill rates – they’re expensive, because they should be – or because of marketing, because most people that I know of dislike marketing themselves.)

This is a tiny list, just like the “stars” list. There are six names on it. Most of the people on it – if they know about the list at all – suspect strongly that they’d be on it.

If a billionaire came to me and said “build a team, cost’s no object, they should be able to do anything,” these are the people I would call.

When these guys ask stupid questions, it’s fine – they’re either trolling me, teaching me, or using rubber duck techniques – they don’t remain stupid for long. I can point out the thinking behind the answer and they get it. Quickly. Discussions here tend to be one-sentence affairs: “I need to do this.” “Do you have that available?” “Got it, done.”

The B-Team

This is a much larger group than the A-Team. (This is actually most of the industry.) These people are really good at what they do. We tend to get along well. (Sometimes we don’t – some people who would be on the A-team end up here because of personality conflicts.) These are people from whom I can learn, and who can generally pick up things from me as well.

When these blokes say something silly, it’s usually because they’re competent enough to know better, but they just don’t see it at first. I’m pretty sure they’ll come around. I don’t resent the question, because I have faith in their abilities.

Membership on this list is a sign of respect. I’d be happy to work with these people – in fact, I’d put myself on this list. Most of the people with whom I’ve worked are also in this group, although I suppose quite a few would be on…

The C-Team

This is the reservoir of the Great Unwashed. (Note the sarcasm.) These are people who might be competent, but… they just don’t have the “it factor.”

Maybe they’re just not interested enough in how things work, or why they work. Maybe they ask just the wrong stupid question.

It’s hard to say, but these are people I’m happy to help and teach. I don’t have anything against them, but I’d see being recognized as their peer a little insulting. (That’s not to say that it might not be true.)

Some people are here because of aptitude; they just really aren’t that good. Maybe flipping burgers is the right option, you know?

Some people are here because of attitude; when something is explained to them, they get it, but they need to have it explained in depth because maybe they just don’t care that much.

Silly questions from this group… well, I want to answer them, and I try to, because I want them to show me that they belong in the B-team. But after a while, the questions get tiresome. Too many of them, and the person risks falling into …

The Gutter

These are people who you probably couldn’t pay me to work with. They’re actively stupid, or doltish, or offensive. They’re jerks, or they’re ignorant beyond belief.

them: how do I echo 'hello world' to stdout?
me  : println("hello world")
them: but how do I get it to say 'hello world' on stdout
me  : um, println("hello world") just like I just said?
them: why didn't you just answer me
them: I tried pritnln("kelpto whirl") and it didn't print hello world
me  : did you try copy and paste on the code I showed you?
them: no, that'd be dumb, i just want to print hello world
them: you dirty Jew, Jews need to be killed on sight
them: did you know that Jews drink blood at night
them: I hope you get cancer and die

Sadly, this kind of conversation has actually happened.

This is, thankfully, a small list. People here are the ones who work to get on it. You don’t just find your way to be in the gutter; it takes effort and persistence. (After all, everyone has a bad day every now and then, and I want to forgive just like I want to be forgiven, right? Do not do to others as you would not want done to you, and all that?)

This is R. Hillel’s “golden rule.” Jesus inverted it to a positive form: “Do unto others as you would have done to you.” Your mileage may vary as to which one you prefer; common culture refers to the positive form as the “golden rule” and the negative form as the “silver rule.”

People on this list are trying to offend, through rank ignorance or through trying to be offensive.

I don’t like people being here. It means that I have failed to forgive them, and that I am unable to empathize with them somehow.

But it happens.

I definitely don’t tell people when they’re on this list – for one thing, that would mean talking to them (and their being in this group means I find talking to them distasteful) and for another, it would probably be insulting to them, and I don’t want to insult people unnecessarily.

Why?

Hmm, good question, Anonymous Reader!

I make these lists because it is a way for me to help determine who I want to learn from, and how much. It’s also a way of complimenting them in my internal dialogue, which makes it easier to compliment them in my external dialogue (i.e., the bits people can read and hear.) It’s good practice for me.

Actually, now that I think about it, if you’re careful, you can tell who’s on each list… except for the gutter. People there are difficult to detect, because I try not to ever communicate with or about them, and if I do mention them, it’s not by name.

MySQL Log Sequence Error: solved, sort of

Back in November, I noticed that I was getting a lot of MySQL crashes, with MySQL’s logs saying that it had a log sequence error. It was actually MariaDB, the community-managed version of MySQL (as opposed to Oracle’s “MySQL proper“), but for all intents and purposes, it was MySQL.

I tried everything I could think of: delete the logs, restore from binary backup, restore from text backup, everything. After switching away from MariaDB and to the “actual” MySQL, it seemed to be more stable at last…

But last week my wife reported that her site was having some of the same problems. I took a look at the logs again, and .. the log sequence errors were back. What’s more, while watching the logs (via tail -f mysql.log I could see new failures happen, through routine usage: MySQL crashed, then restarted. Then crashed, and restarted.

Back to the drawing board: first, I repeated some of the same steps I’d done before, making backups just in case. I cleared out the database, and recreated it from scratch; same problems.

By this time I was really frustrated, but then I noticed someone mentioning… memory.

I immediately cranked up top (actually, htop), and noticed mysqld was using a ton of RAM, and that my server was swapping quite a bit.

I’d not tuned MySQL (as the RamNode wiki actually suggests you do!) but my first thought was that I wanted this server to actually be used for more things – so I bumped up my VPS to give it three gigabytes of RAM. It’s a new service plan, but the nice thing about it was that as soon as I ordered it, the instance was allocated more RAM – I literally watched it happen via htop.

And all my MySQL crashes went away. It hasn’t had a crash since then.

My other server – oddly enough, the one that runs this site – is still on a more constrained instance. I may move it at some point, because I don’t know that I need or want two servers; they’re just cheap enough that it wasn’t a big deal.

Playing with Meteor: Account Management is awesome

I started playing with Meteor – a Node.js and MongoDB framework – last week as a toy app to help some friends, and it is actually really nice. I don’t know yet how serious it is – I’ve only got maybe five hours playing with it – but one thing it has over almost every other framework I’ve played with lately: a trivial way to handle account and user management.

meteor add accounts-ui accounts-password alanning:roles

… and done. To provide a login component, you add an invocation to your HTML template:

{{> loginButtons}}

Role management isn’t quite as trivial; you have to set up the roles for the users, but checking for roles is pretty easy too:

{{#if currentUser}} 
    You're logged in.
    {{#if isInRole 'admin' 'default-group'}}
        You're in the admin group.
    {{/if}}
{{else}}
    You're not logged in.
{{/if}}

Considering that this is one of the first things an application needs to set up, I have no idea why no other framework I’ve looked at has anything quite so trivial – or if it does, why it’s so hard to find. I’ve looked at multiple languages – Python, Ruby, Java – and multiple frameworks for most of those languages, and Meteor has them all beaten, hands down.

So for the first time I’m actually looking at using Javascript – via Node – seriously as a development platform for myself. It’s … interesting; my thought that Javascript just isn’t that great remains, and the Meteor interface to Mongo is actually quite constrained, but I think I can manage that.

The account management isn’t perfect – but it is good enough that you can actually get started with it, and focus on your app, while using what you need from the users collection.

In fact, some of the things about it are confusing – for example, account names seem to be populated (by registration) inconsistently – but these are problems that are probably caused by my lack of experience, rather than actual issues with the mechanisms themselves.

Well played, Node. Well played.

I still prefer Java.

I still prefer Java over other languages.

The Background: Javabot

I’m a fairly regular contributor to javabot, an IRC bot written for the Freenode ##java channel. I don’t know that I’d be considered a major contributor (I’m not listed in the credits, for example, so maybe I do know – and I’m not a major contributor) but I have a few solved issues to my credit…

But I think my contributions to the Javabot project are nearing an end, and the reason is really rather sad: Kotlin. Javabot recently underwent a conversion from Java to Kotlin.

Kotlin, as a language, looks really neat. It has features that I think Java could use; it’s probably a viable alternative to Java and, to some degree, Scala. I know a number of programmers who are using Kotlin, and what they describe sounds good.

But I don’t know Kotlin. I know a lot of coders, and of those, very few programmers who use Kotlin – and even though Kotlin looks neat, I can’t use it apart from cottage projects like javabot. The ecosystem around Kotlin is just too small.

That means that any contribution I had for Javabot would be severely limited by my own skill level in Kotlin – which is not experienced enough to even describe as “kotlin newbie” but is still stuck at “laughable.” Any code I wrote for javabot would be more of a burden than a benefit, since the code would have to be severely vetted.

I wouldn’t want a contribution to be a burden – any contribution of mine would be intended to make the world better, not harder. So I think that my participation in the project is limited by my own good intentions.

Prefer Java. Really.

The result, then, is that I’d suggest that coders use Java, even though it’s not as cool or as full-featured as some other languages might be. I’d be willing to consider projects in Scala, which has a viable ecosystem at this point (and has no difficulty leveraging Java’s ecosystem), and even Groovy (which leverages the Java ecosystem more than its own, as far as I can tell).

That’s not to say that Kotlin doesn’t have viability in its future, nor is it to say that I have no interest in learning Kotlin – it’s just a recognition that a new language is a burden to contributors, and because I don’t want to shoulder nor cause that burden, I’d suggest sticking with a language that’s common for the programming environment in which you live.

Addendum

I don’t blame the author for moving to Kotlin – javabot is a cottage project, really, and he actually did ask contributors their opinions before migrating. I voted for the migration. I just didn’t anticipate how I feel about it today as a result.

Python REST service with Django

This is a record of my experiences parsing JSON in a REST service written with Django, in Python.

I’m following various tutorials (including Django’s REST framework tutorial), but I was really struggling to get a snippet of JSON actually processed on the server side.

The service in question was really quite simple: a name completion service, given a band name in a Suggestion object.

def bandnames(request):
    if(request.method)=='POST':
        d=request.body
        stream=BytesIO(d)
        data=JSONParser().parse(stream)
        serializer=SuggestionSerializer(data=data)
        return bandcompletion(request, serializer.initial_data["band"])
    else:
        return HttpResponse("Bad request: Wrong method")

The bandcompletion method actually does the work of returning a JSONResponse – this is basically a wrapper method to accept JSON from the body of a POST request.

I was unimpressed with the sequence: I get a str from request.body, but then I have to convert it to a stream for JSONParser.parse (with BytesIO), which was actually the point at which I was getting lost. Apart from that – which I think was obscure only to me – everything was pretty straightforward.

Now I can issue an HTTP request via curl:

curl -i -H "Content-Type: application/json" -X POST \
  -d "{ \"band\":\"r\" }" \ 
  http://127.0.0.1:8000/bands

Postman can submit the data as raw body text for the request as well.

There’s no error checking here yet, and the error message for the wrong method is terrible, but all of that is going to be fixed. I’m still playing around.

Exciting stuff, I know, but I’m a big believer in recording things I find significant, just so other people can correct me, or learn from me if I managed to see something they didn’t.

Test-driven development can be great.

Test-driven development” is one of those things that causes hives among some programmers, who immediately stand up and plant the claim that it’s worthless, peurile, deceptive, and generates awful code… never mind that others manage to use it effectively.

I have been playing with a 2D cellular automaton, largely inspired by Stephen Wolfram’s “A New Kind of Science.” I have a version in Java, and a simpler implementation in Python. TDD made the Java version work properly, and a lack of TDD left a hole in the Python version.

“A New Kind of Science” wasn’t the only inspiration, of course. A friend of mine published a Javascript version of Conway’s Game of Life – a 3D automaton that’s pretty well-known – and a different friend of mine was looking for simple projects that he could use to help teach kids how to program, and a 2D automaton came to mind – so he asked for an implementation, which is why I wrote the Python version of the automaton.

This post is not about the automaton itself. I’ll write that up later. (If you’re interested, you can see the source.)

What is TDD?

Test-driven development is, loosely defined, a practice in which tests are written before anything else, without regard to correctness.

For example, if I want to write a program to generate “Hello, world,” I would write a test that validated that “Hello, world” was generated – before I had anything that might create the output. My tests would fail; they wouldn’t even compile until I had some sort of implementation written.

However, by writing a test before anything else:

  • I more or less force myself into having some sort of specification that says what “correctness” means for my program (it is “correct” when it generates “Hello, world”)
  • I also force myself into writing something that has a reasonable interface (because I’m writing how I think it would be called, before writing the guts of the implementation)

By writing the tests first, I’ve effectively given myself a criterion for completeness. When my tests pass, I know I’ve “finished,” because my tests define a specification.

There’s nothing wrong with the specification being incomplete, of course; it may so happen that later, I want to greet someone specific. By having tests in place, though, not only do I have a record of the specification, but I also have a way that I can add to the specification in such a way that I know I’m not breaking code – I would simply add more tests that corresponded with the changing specification, and I will know if my changes break other code.

How did TDD work out for my Automaton?

Here’s the thing: I wrote the Java implementation using test-driven development practices (TDD), and the automaton is kinda neat; it generates some fascinating patterns even without entropy or a variable starting cell structure. An example, of pattern 171, using a color rendering mechanism:

color-171

The Python version was written because a friend of mine wanted to consider using it for a class he’s teaching. The Python version is very much simpler than the Java version, because it doesn’t do as much (it can’t output to multiple formats, for example).

It was not written with testing in mind. Why would it be? I had written the Java version from tests first; I was only writing a simple port to Python.

It was also wrong. A 2D automaton can “grow” to the right and to the left, depending on the pattern it’s given; the Python version could only grow on the right, because I had an off-by-one error in a core routine.

TDD would have caught that early (and it did catch problems like that, in the Java version).

TDD also provided me the opportunity to fix the names of structures (renaming Dataset to Generation, for example) because the tests made it obvious that the names were inaccurate.

Could I have done it without TDD? Of course. TDD isn’t the only way to write programs well. It’s not the only tool used to work out good names, or good processes, or even to validate that programs work properly – the Python version of the automaton was fixed without TDD, for example.

If you’re wondering why I didn’t use TDD for the Python version, it’s because I’m too much of a newbie with Python to know how, yet – and as I’m not really a Python programmer, there’s not a lot of need. However, seeing the differences in the development process between my Python implementation and the Java implementation, I might look into TDD with Python anyway.

Essential Slick: a review

Essential Slick” is a book by Richard Dallaway and Jonathan Ferguson, published on underscore.io. It’s designed to be a compact guide to Slick, a database-access library for Scala, and succeeds admirably in its goal, even in early-access form.

The book is very easy to read; it’s published in multiple forms (epub, HTML, and PDF). I chose to read the HTML version, as I’m reading it on a Macbook, and HTML just seemed the most generic.

However, the content is the most important part.

I’ve tried to play with database access in Scala; I usually end up working with a model written in Java, accessed via Hibernate, because of familiarity with Hibernate and, more importantly, because the documentation for various Scala database access mechanisms is simply inaccessible to me – generally being unclear or simply not working.

I have tried Slick tutorials, for example, and the Hello, Slick example projects – only to have them fail out of the box or simply not working, without clear explanations.

I’m pleased to report that this has not been the case with Essential Slick – the code has worked very well, and been explained clearly.

While in early access, there are a few minor problems – for example, in the book’s source code they use durations early on without specifically including them or describing them. (The example project, however, does include all required types.) Likewise, the output from the example project is slightly different (being far less verbose) than the book’s project code.

These are not real problems, at all; durations are fairly obvious, and the debug output is actually very informative. It just wasn’t entirely expected.

The exercises are useful, and are accompanied by explanations of various problems you might encounter; this is very newbie-friendly (as one would hope from an introductory text) and therefore targets its audience perfectly.

Topics proceeding through the book include selection, modeling, and combining actions (i.e., building complex transactions). Along the way, some assumption of Scala knowledge is expected, but it’s not written arrogantly – Scala beginners can expect to understand the content.

By the end of the book, even in early access form, readers can expect to have functional and useful knowledge on Slick, and can expect to be able to write workable applications leveraging relational databases and Scala.

I’ve found the book to be highly informative – and, if you’re using Slick, necessary, compared to the other Slick resources out there. Highly recommended.

MySQL log sequence error

I just had the most fun time ever with MySQL backing an instance of WordPress. I’m writing up what happened and how I “fixed” it – note the quotes – just so others are aware of it. Maybe someone has insight into how it happened or has a better idea of how to fix it for the future.

The Errors

The timeline for the error stretches back months, to an ill-advised VPS update on my part back in August or September, I think.

I use RamNode as a hosting provider; I chose Fedora as my OS (because it’s Fedora, and I like Fedora.) However, it’s limited to Fedora 20. I wanted to run something more current (since Fedora 20’s a little bit outdated) so I went through the update process — which broke everything catastrophically.

RamNode justified itself, by providing me with a system image backup (which isn’t, by the way, their responsibility – they just came through anyway. I highly recommend RamNode.) So I rebuilt the image. I copied all of the WordPress files back into place, and did the same for the MySQL database directories.

The truth is: it was MariaDB, not MySQL. All of the commands are the same and the database files are binary-compatible; I prefer MariaDB for social reasons. By the end of the adventure, though, I was using the community version of MySQL instead of MariaDB just in case it was something about the MariaDB release. Voodoo debugging on my part. 🙁

I got a few log sequence number errors there, but removing the old log files cleared that up, or so I figured. At the very least, I didn’t see any problems.

The “Fix”

Now let’s zoom up to November. My wife told me her site was performing very poorly, and since her stuff’s pretty important to me (it’s hers!) I took a look at the logs to try to figure out what was happening to the database.

It turns out that the log sequence problem was back, with a vengeance. Now every database interaction was firing off dozens of log sequence errors, resulting in the database being killed and restarted. No wonder her site was performing poorly.

I removed the log files again (to try to get the log sequence reset, because ignorance) and that didn’t fix anything; I tried to do a mysqldump and it couldn’t even read the data. I’d get errors trying to make a backup.

This was not good.

I then took a copy of the backup (made via tar cvzf) to a virtual machine (through VirtualBox, on a Fedora image). Here’s the odd thing: the machine in VirtualBox had no problem reading the database. Record counts were fine, all of the data was there.

I didn’t alter the files at all – but took the opportunity to dump the data (via mysqldump again, this time getting a valid SQL dump.)

I went back to the VPS, and uninstalled the database, cleared the directories, and then restarted the database (i.e., exactly what I’d done with my virtual image). (This is when I switched from MariaDB to MySQL, incidentally.)

… and what happened, you ask? Well, the same thing – lots of journal sequence errors. Here’s a sample:

151115 15:42:11  InnoDB: Error: page 1749 log sequence number 632633543
InnoDB: is in the future! Current system log sequence number 510316477.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.

This was concerning. I was about to have to open up the MySQL files and alter them directly (which I’d been trying to avoid.) But I wanted to try one more thing, since I’d done the data dump on my virtual image…

So I shut down the database, and removed the data again. I restarted the database server, and created the database and database user.

Then I fed the SQL into the mysql client, and ran some simple queries to see record counts and some data. No errors.

Then I restarted WordPress … and lo and behold, everything seems to be okay.

The moral of the story is, of course, to make regular backups and watch your servers… I’m hardly a MySQL admin, and clearly I need to get better about all of this.