JNDI

Things I have learned recently:

  • People still don’t really get JNDI, and the Java frameworks around today make it easy to ignore, even though it’s still a core technology. It’s not difficult to see how it can be confusing: context in JNDI is everything, and context makes it a challenge to create examples that make sense in the general case.
  • At some point I’d like to learn Go.
  • Not something I’ve learned, but something I’ve been reflecting on this morning because … uh… I have no idea why: I wonder if Adidas shoes are any good, or what they’re good for. I tend to wear Vans Ultrarange shoes these days because they’re light, comfortable, and last forever – I have two working pairs, one for working in the yard and one for wearing – but… Adidas.
  • I really wish officials and announcers wouldn’t show bias during football games. As an FSU guy, I’m really, really, really tired of this – but I’ve been watching other teams’ bowl games (because FSU didn’t go bowling this year, first time in 40+ years) and it happens for them, too, often egregiously. The announcers I don’t care as much about, but the referees… those guys need to be fair, for real. The fact that there’s no urgency in making sure they’re fair is incredibly frustrating and erodes the game. n one game, a team had two defenders ejected for targeting… and the other team had an obvious false start missed, and a few targeting possibilities ignored by the guys in stripes. Let’s just say nope to all that. There needs to be a way for the league to tell these refs what they’re missing, and to either call it fairly or get out. It’s gotten really bad over the last few years, with FSU losing multiple games due to bad or missed calls.

Picking Content

Things I have learned recently:

  • Picking items for these lists can be a drag, because I’m trying to be very careful to have positive (meaning “uplifting”) assertions where I can. It’s not a perfect system; even in this post I have “I don’t like…” but hopefully it’s neutral enough to pass my criteria. My overall rule of content hasn’t changed for decades: I’m very uncomfortable posting anything about which I’d have been ashamed to tell my grandparents. No sexual content, and few (if any) curses, for example. I’m happy with that; I don’t really feel any temptation to write about those things, but it’s a good metric to have to keep my own writing “above board.”
  • The world needs more success stories. I try to do everything I can to help create them… and I’m fine with not being featured in those success stories. The success is enough. “Tikkun olam” – a Jewish idea that means “healing the world” – drives me in this.
  • On the other side of the previous point: I will always be grateful for those who try to ensure my success. It’s one of the ways I differentiate benevolent acquaintances from my friends: if someone works to ensure my success, that person is my friend. I will always appreciate them. Period. Acquaintances are those people with whom my success is of tangential importance. There’s nothing wrong with that, but those people are rarely my friends.
  • When writing a crawler, it’s always nice to save the downloaded files in proper format rather than, well, improper format. Whoops.

Blogs are so eh

Things I think I’ve learned today:

  • Blogs are so yesterday, man. It’s probably the medium and platform I’ve chosen – I don’t use Medium, for example, although I have an account there – and I don’t publish often enough, or with enough direct focus, to really attract users, because I’m really not trying to build a popular platform. But at the same time, my pride is hurt just a little that traffic is so low.
  • Crawler4j makes it really easy to write a working web spider in Java – or, in my case, Kotlin. Looked into scrapy for a bit, but couldn’t figure out why what should have been a trivial recipe was so hard to find, switched to Kotlin because I really just needed to make progress. Progress was made. Scrapy’s probably fine – maybe the information I was looking for was out there, right in front of my face, and I couldn’t recognize it, but that didn’t help me make the progress I needed.
  • Had a good example of pragmatism in action show up yesterday, too. We issued a pull request for a migration, and one of our developers pointed out a number of situations in the new codebase that could have been problematic. I asked him to demonstrate the errors (or potential errors) with tests so we could validate them, but he wasn’t sure how much effort would be involved in writing those tests… so we progressed with the merge. I think we made the right choice; he’s not wrong in his observations (we might have introduced errors with the new changes) but without validation, we can’t know, and we’d be chasing ghosts. We made notes that the code might be problematic, and we’re going to watch for problems that come from it.
  • Aaron’s Thinking Putty is cooler than it should be.

JUCE, tox, Euclidean beats

Things I’ve learned today:

  • tox is a Python library designed to “standardize testing in Python” – including testing a given project across Python versions (so you could use it to create a library for both Python2 and Python3, and test in both environments.) I’m working on such a library right now; I am using two shells, two directory trees, two virtual environments… which is a pain. tox looks like a library to help me get around that.
  • JUCE is a library designed to help delivery music … applications. It has the ability to generate UIs, VSTs, AAX plugins using C++. One of the things I keep thinking I want (although I’m not sure I actually do) is a Euclidean beat generator; I also wouldn’t mind doing a cellular automaton to generate music, so JUCE looks interesting. I haven’t done C++ for real since my Alcyone project (my MIDI foot controller hardware) so I might have to approach this slowly.
  • I like this format of data capture more than I enjoy sites like Twitter. Sure, Twitter’s probably fine for simple assertions, but I don’t like simple assertions; there’s no room for nuance, and in the real world, there’s… nuance. So far, this allows me to make an assertion and explain it without worrying about incomplete, piecemeal consumption. I just have to build the habit, and work on classification.
  • Since I mentioned it earlier: Euclidean beats are apparently found in real world musical forms, and that’s kinda awesome… but every time I’ve played with them, I don’t care for the output much. Euclidean beats tend to be regular (therefore, well, Euclidean) and my own percussion approach, when I focus on it, tends to focus on the unexpected hits rather than regularly timed hits. Euclidean beats spread out hits over known periods; I cluster hits inside those known periods instead. Which approach is better? Well, you can find famous percussion virtuosos who use Euclidean approaches, and I’m neither famous nor a percussion virtuoso. Hmm.
  • Wildwood Guitars has amazing prices on Rickenbacker guitars, and from what I’ve been able to tell from asking Rickenbacker players and from community reviews, they’re quite well respected. And yes, they carry the basses – and have used equipment as well.
  • While I’m thinking about Rickenbacker basses – which happens a lot more than I expect it should, really – there are two main products, the 4003 and the 4003S. The 4003 differs in the fretboard inlays (the 4003 has a sharktooth inlay, the 4003S has a dot) and the 4003 has two outputs (“Ric-O-Sound”, where each pickup has its own output jack) and the 4003S has only a single mono output. The 4003 also has a bound body and the 4003S is unbound; apparently some people find the unbound body more comfortable. I have not done this comparison myself… but if I were to figure out my ideal Rickenbacker bass, it’d be a Midnight Blue 4003, although the others are pretty too. I do not have a Rickenbacker bass, nor is that likely to change, as I’m not a working musician and I don’t need another bass to replace my Jazz… I’d just like a Rickenbacker just because.
  • I find it extraordinarily difficult to trust Donald Trump. His wife shouldn’t trust him, and his ex-wives clearly shouldn’t have trusted him either; why should I trust him, when the people to whom he owes trustworthiness most can’t rely on him? And he employs “the best people”… and doesn’t trust their expertise when it suits him to counter their opinions. Yeesh. We elected this guy. I hope we deserve better.

Python docs, more on wget

Things I have learned today:

  • Python has a lot of modules that are documented well enough to make you cry. Other modules are documented so poorly that it will make you cry. Why am I using GNU parallel? Because creating a bounded threadpool in Python, a task that seems like it should be pretty straightforward, was documented so confusingly for me that I just ended up using the command line instead.
  • wget is a surprisingly easy way to hammer your CPU; run eighteen simultaneous processes and watch the CPU bleed. Great fun for all! (If you can’t guess: parallel is being used to fix this.)
  • I will be fascinated when I learn Gutenberg well enough to leverage it. It’s supposed to be like Medium’s editor, and I suppose it is; I don’t like Medium’s editor either.
  • The best and worst thing about programs to allow you to play Solitaire is how easy it is to play a new game; you end up not valuing a given hand, because if it gets difficult… redeal. That means you lose some hands you could win (“eh, too hard”) and means that you also don’t really value winning as much as you used to, because you can play so many hands so quickly.
  • I find no irony or contradiction in appreciating the Avett Brothers alongside Rush and Yes and Pink Floyd and Led Zeppelin and Weather Report, but I expect others to be surprised at my choices in music. I think other people think I am easily pigeonholed, and maybe I am, but not along the lines of music genres… I think.
  • I ache when my friends ache. Sometimes I wish I did not, but I think the world would be much, much sadder for me if I couldn’t share others’ pain.
  • GNU parallel uses perl. This is amusing. It works; it’s great; it’s still amusing.
  • I like jokes that don’t have victims, generally speaking, but I have no problem using a few specific people as the targets of jokes… Paul Finebaum comes to mind. Give us a rest, Paul. We know you like the SEC.
  • Other people whose voices I could do without: Stephen A. Smith; Sean Hannity; Tucker Carlson. By the way, Tucker, not that you’ll ever read this, but…
  • YES, diversity is a good thing, and while we can argue about specific granularity and I have no problem conceding that there has to be a certain amount of homogeneity in value systems, only a total moron would dare argue seriously that cultural diversity, in and of itself, is a Bad Thing. Shut up, you knob. I appreciate that you use a cannon where a scalpel is better suited, and I hope you know that this is what you’re doing (and therefore you’re being obtuse on purpose) but every now and then it’s good to remember that nuance is A Thing To Use.
  • I decided I was going to try to publish one of these a day, and that streak lasted for ONE DAY.

GNU Parallel, wget, Avett Brothers

Things I have learned today:

  • GNU Parallel is actually pretty nice. It will take some time to get used to how it applies the command line and interpolates the actual command to run, but the documentation is pretty thorough and my needs as of right now are pretty light.
  • That said, parallel --bibtex is annoying… and necessary. (Otherwise it demands you cite parallel in your … paper. Which I’m not producing.)
  • wget is much, much, much faster with the -nv option than without.
  • My middle son can appreciate the Avett Brothers‘ talent, but doesn’t really care for them much. No hard feelings, kid.
  • I don’t care for WordPress’ Gutenberg editor much yet.

Five Laws of Open Source

In order to work effectively, open source as a paradigm has a set of rules. There are a few examples of rules out there, but I think I can help, because I’ve got no problem flouting the rules when I feel like it. (Guess which ones I’m flouting!)

  1. Be thou not a jerk.
  2. Conserve not thy knowledge, for knowledge, unlike material goods, loses value in scarcity and gains value in propagation.
  3. Respect thine ego as it deserves, which is to say: not at all, for upon the shoulders of humble giants dost thou stand, and if afraid to look stupid had they been, their shoulders available for thy standing would not have been.
  4. Don’t talk like Yoda. It violates the first law, geez.
  5. Expect participation to be its own reward.

Be Thou Not A Jerk

This should be fairly obvious. While open source definitely has its share of prima donnas, try hard not to join their ranks; most of the prima donnas out there really don’t have the social capital to justify their behavior.

You really shouldn’t attack anyone. Avoid all ad hominem attacks; if an idea isn’t as strong as it should be, well, say so – and say why – but even a bad idea doesn’t mean that the idea’s author is stupid.

Maybe the idea is perfect for what the author is trying to do, after all, and they’re speaking from their perspective; even if the idea is a terrible one, it’s worth using gentleness.

If forceful behavior is justified (i.e., someone has a terrible idea, it’s been rejected through reasonable explanation, and the author gets abusive or insistent beyond reason), then… honestly? Excommunication in context is the best approach in most cases. Just mute the poor fellow in the context of the project; document the idea and why it isn’t a good approach for the project at large (see the next rule!), and move on.

Abuse is dangerous; it poisons the community, and makes you look bad, even if it’s justified in your mind.

Obviously, trolling is a terrible idea for everyone. I’d say the same for sarcasm, but I’m pretty sarcastic myself and my ego wouldn’t survive that kind of self-examination.

Conserve Not Thy Knowledge

In the Real World, scarcity drives up value. If you’re the only person in the world with a particularly nice guitar, well, that guitar is going to (potentially) be worth a lot, because of scarcity; if anyone else wants it, well, they’ll have to compete for that unique object.

Likewise, if a resource is limited – like, say, oil – people with need of that oil are subject to market scarcity; the less oil that is made available to the consumer, the higher the price that the consumer must be willing to pay.

Knowledge is not like that. If you’re the only person who knows something, that knowledge isn’t worth much. But if you tell everyone, all of a sudden that knowledge is incredibly valuable.

There are obvious retorts to this; if you’re good at something, and people need someone who’s good at whatever it is, they’ll (hopefully) be willing to pay more to acquire the services of skilled personnel. (Look at basketball teams: everyone knows how to shoot a basketball, but the people who are actually really good at it get paid bazillions.)

But I’d say that in open source, that doesn’t work; there are exceptions, of course, but the general situation is such that knowledge is worth more when it’s shared.

If you figure something out, say what it is, from beginning to end. Describe what it is that you’ve figured out, and describe how it works, why it works, where you’d use it. Go overboard. Drown people in your knowledge. Make it so that if you were to be hit by a bus tomorrow (God forbid! … or The Cold Unfeeling Universe forbid! … or whatever) that what you’ve learned could be replicated 100% by someone who could read the documented knowledge you’ve left behind.

It doesn’t matter, really, even where you left it – as long as it’s publicly available. If Google can find it, it’s good. I’d suggest avoiding social networks, though, because social networks artificially inflate the propagation to a specific subset of people (the social network itself), and the idea is to be egalitarian.

Respect Thine Ego As It Deserves

Having an ego is healthy. However, remember that your ego – your pride in self – is best served in open source by remembering that you stand on the shoulders of those who came before you.

You may have reached a pinnacle of knowledge, but in open source, that usually means that you, yourself, took that last step to reach the top of the mountain – while everyone else carried you to the previous position. Everyone else built the ladder that you have climbed.

That’s not to devalue that last step – someone has to do it! But bragging about being the one who made the leap is in very poor taste.

In open source, we all build together.

Someone else (probably) built the operating system you use. Someone else (probably) designed the language you used to you write your project. Someone else (probably) developed methodologies for development that you used. Someone else (probably) defined the environment in which your project will operate.

Very few of us are capable of actually working alone in all of this; we all leverage the information that others have made available.

So.. yeah, be proud of who you are and what you’ve contributed, but remember that it’s another diamond in an wasteland full of such jewels.

Humility is your friend… and it saves you a lot of embarrassment in the rare event that you’ve screwed up. Let pride drive you to do things well, but don’t let pride dictate how you interact with others. Let others sing your praises; about yourself, be silent. (Enough moralizing from the moron at the keyboard.)

Don’t Talk Like Yoda

Look, I get it; Star Wars is fantastic in a lot of ways that matter. (Great special effects, especially for the time; a grand storyline; great ideas.)

However, it’s awful in a lot of other ways: the dialogue, especially when George Lucas wrote it, was terrible, and a lot of the problems had really simple solutions (often as simple as “eliminate the so-called ‘good guys’ and the ‘bad guys’ wouldn’t be all that bad. They’re mostly evil because the filmmakers wanted to have good villains. Except Jar-Jar. He’s raw malice wrapped in a can.”)

And then there’s Yoda.

I won’t bore you with criticism of Yoda beyond his speech patterns, which are… unique, and surprisingly hard to replicate well.

The simplest way to imitate Yoda, though, is to invert the structure of a sentence. Speak with passive voice, you should do! Learn this way of speaking, you should not!

… okay, this rule was me picking on myself, for writing the first three rules in Yoda-speak. Don’t follow my example; don’t use Yoda-speak. Really. (Or: “Speak like Yoda you should not, ha ha ha! Why you are making faces like that, I do not know.”)

Expect participation to be its own reward

This is the hardest one, I think (unless it’s “Don’t talk like Yoda”.)

But this is the one that makes it all work.

When you participate in open source – when you ask a question, when you tell a newbie how to do something simple, even when the docs show it somewhere… you contribute value. You build. You make the world a better place.

If you want more than that… well, you might get it, but you probably won’t. The truth is, that knowledge that you’ve helped improve the world around you is the most likely reward you will see.

Being honest, if you participate a lot, you will probably gain a sort of social capital; people respect that participation, and the value you contribute eventually will come back to you.

I’m proof of that, I think; when someone asks a question, I’ll often try to answer and guide. Sometimes I’m flat-out wrong, and that’s fine; someone more knowledgeable will correct me (sometimes even gently) and then everyone learns… but the key is that I can usually ask my own questions and people will try hard to answer me, because they know that I will share that knowledge to others, and because they (for better or for worse) respect my intentions to participate.

That’s social capital. I wouldn’t know (or want to know) how much social capital I have, nor would I know how to quantify it, but what social capital I do have, I think I can directly attribute to my expectations of reward.

If someone asks a question that I can answer, I don’t expect anything from them apart from their growing knowledge. And I shouldn’t.

Again, sometimes it works differently; sometimes you help someone and they return the favor to everyone’s benefit. Sometimes they pay it back; often they pay it forward (a more desirable result).

But if you act in open source in such a way that you want what you offer paid back… that’s a violation of open source principles.

Don’t do it. Pay everything forward, and expect that from others. Everyone wins that way.

Your Infrastructure Uses Programming Principles Too

It’s important to remember that your development and deployment infrastructure uses programs, written by programmers. Because of that, sometimes things will break, and while that’s not fun, it’s still just programming – report bugs, try to figure out fixes and workarounds, discuss, and contribute, just like you would with any other programming situation.

Of course, if you encounter bugs and refuse to report them or try to fix them or even document them, I’d say you’re working against your own best interests. You’re far better off participating, even if it’s as shallow a participation as “Hey, I found a bug when I …” because that at least helps track the number of people affected by a problem.

And in the end, it’s a programming problem – it’s likely to get fixed, because we, as programmers, like things to work. It’s pointless to just throw shade at systems, regardless of what they are or even if they deserve it.

Problems are just problems. There are solutions.

In May of 2016, a programmer removed left-pad from the the Javascript ecosystem. In the process, he broke thousands of dependent projects that used left-pad.

This was a big problem. (Thank you, Captain Obvious!)

But why is it such a problem, in the grand sense of things? I’d say it was a serious failure, to be sure. But triaging the issue realistically, there were workarounds in very little time (namely, replacements for left-pad) – and it also highlighted the importance of redundancy and licensing in the Javascript ecosystem. Two problems, one fairly limited (the dependency on left-pad) and one broader in scope but normally something people were able to ignore (the availability).

Both ended up being addressed, one way or the other. People learned. People moved on. Sure, there was a lot of griping about a brittle ecosystem, and maybe it was deserved to some degree, but … truthfully, the only comments that would have been worth paying attention to were the ones that tried to fix the problem, or at least acknowledged that it was only a problem.

More closely to home, some of my infrastructure tooling at work failed thanks to a Docker upgrade; Docker itself didn’t fail, but a tool that used the Docker daemon to build images started failing.

One of the comments from a fellow coder: “This is why I don’t like the JVM.”

My response – kept to myself, I hope – was not entirely positive, let’s say. That’s a dumb response. That’s a knee-jerk response; it doesn’t acknowledge that infrastructure tools are programmed, too, and things happen.

Docker changed; the tooling is changing, too; it just takes time. The timing’s inconvenient, sure (I changed the build process to avoid that mechanism), but it’s just a problem. It will be fixed. The bug was reported. Information is being exchanged. The tooling will be fixed, because there’s no way I’m the only person who’s encountering this bug (and I’m not; others have reported it as well.)

But the proper reaction is definitely not to decide that the entire ecosystem associated with what you’re working with is broken; after all, other ecosystems are going to have their own bugs, too.

The solution is to communicate and participate; use a workaround (like I did) if you have to, and sometimes you do, but don’t abandon the process. Report. Heck, patch if you can.

A New Maven Archetype for Starter Project

I’ve recently put together a new Maven archetype, based on something I saw in Freenode’s ##java channel a few weeks ago. Basically, someone had built their own archetype for “standard projects,” with a few sensible dependencies and defaults, and while I thought it was a worthwhile effort, it didn’t fit what I found myself typically doing.

So I built my own, at https://github.com/jottinger/starter-archetype.

Primary features are:

  • Java 8 as a default Java version
  • Typical dependencies
  • Maven Shade

Java 8

Look, Java 7 and older has been end-of-lifed; you can pay Oracle for support and fixes, but you shouldn’t unless you have a real reason to do so. Java 8 is the current Java version. You should be using it, and so should I. Therefore, I do.

It’s an unfortunate aspect of Maven that it defaults to an older version of the Java specification. My starter archetype presumes you want to live in the current year.

Typical Dependencies

My starter archetype has seven dependencies; they are the ones I either include without thinking about it (because I know I’m going to want or need them), or they’re the dependencies that are so common that few projects would blink at their inclusion.

The dependencies are organized into three groups: runtime dependencies, one compile-time dependency, and testing dependencies.

The compile-time dependency is Lombok; it helps remove boilerplate from Java code, so I can build a Java object with mutators, accessors, toString(), hashCode(), and equals() very simply:

@Data
public class Thing {
    // public String getName() and setString(String name) are built for me through Lombok
    String name;
}

The runtime dependencies are Guava, Logback, and Apache’s commons-lang3. Guava and commons-lang3 have a lot of overlap, but both are very common; logback is a logging library that leverages slf4j, so it’s a workable default logging library that doesn’t force you to stick with it if you don’t like it.

All together, they use up roughly 3.5MB of disk space for a starting classpath, with Guava using 2.3MB of it. Given how useful Guava, et al, are, I think this is entirely worthwhile; most projects will have these libraries (or something nearly like them), so it’s acceptable.

The testing libraries are TestNG, assertj, and H2.

It’s arguable that JUnit 5 might have caught up to TestNG in a lot of ways, but there are still some features I really like from TestNG that JUnit doesn’t have built-in support for (namely, data providers – although note that there is a project that provides data provider support for JUnit, unsurprisingly called junit-dataprovider).

AssertJ is a set of fluent assertions for Java. It’s not necessary – for example, I’ve used TestNG’s innate assertions for years without a problem – but the fluent assertion style is rather nice. The actual dependency is assertj-guava – which includes the base library for assertj – but I chose assertj-guava because of the inclusion of Guava as a default runtime dependency.

H2 is an embedded database. I use embedded databases so much for first-level integration tests that it seemed silly not to include it; I have a lot of sandbox projects that don’t use H2, but as soon as I do anything with a database, this gets included, so it makes sense as a trivial “default testing library.”

Maven Shade

I wanted to be able to generate an executable jar by default, not because I do that very much, but because it seemed to be a sane default. (Usually, my starter projects exist to support a test that demonstrates a feature, as opposed to being an independently useful project.)

Because I wanted others to be able to see more use out of my starter project, I added Maven Shade to create a viable entry point and an executable jar.

Using the archetype

Right now, you’d need to run the following sequence at least once to use the archetype:

git clone https://github.com/jottinger/starter-archetype.git
cd starter-archetype
mvn install

Then, to build a project with the archetype, you’d run:

mvn archetype:generate \
    -DarchetypeGroupId=com.autumncode \
    -DarchetypeArtifactId=starter-archetype \
    -DarchetypeVersion=1.0-SNAPSHOT

Future Enhancements

It’s still very much a work in progress. Things I’d like to do:

  • Migrate into the main Maven repositories (publish, in other words)
  • Add publishing support to the archetype itself
  • Include better throwaway demonstrations of the dependencies (a difficult task, as the default classes are meant to be thrown away en masse)
  • Figure out better default libraries, if possible

You’re welcome to fork the project, create issues, comment, use with wild abandon, as you like. It’s licensed under the Apache Source License, 2.0.

A Simple Grappa Tutorial

Grappa is a parser library for Java. It’s a fork of Parboiled, which focuses more on Scala as a development environment; Grappa tries to feel more Java-like than Parboiled does.

Grappa’s similar in focus to other libraries like ANTLR and JavaCC; the main advantage to using something like Grappa instead of ANTLR is in the lack of a processing phase. With ANTLR and JavaCC, you have a grammar file, which then generates a lexer and a parser in Java source code. Then you compile that generated source to get your parser.

Grappa (and Parboiled) represent the grammar in actual source code, so there is no external phase; this makes programming with them feel faster. It’s certainly easier to integrate with tooling, since there is no separate tool to invoke apart from the compiler itself.

I’d like to walk through a simple experience of using Grappa, to perhaps help expose how Grappa works.

The Goal

What I want to do is mirror a tutorial I found for ANTLR, “ANTLR 4: using the lexer, parser and listener with example grammar.” It’s an okay tutorial, but the main thing I thought after reading was: “Hmm, ANTLR’s great, everyone uses it, but let’s see if there are alternatives.”

That led me to Parboiled, but some Parboiled users recommended Grappa for Java, so here we are.

That tutorial basically writes a parser for drink orders. We’ll do more.

Our Bartender

Imagine an automated bartender: “What’re ya havin?”

Well… let’s automate that bartender, such that he can parse responses like “A pint of beer.” We can imagine more variations on this, but we’re going to center on one, until we get near the end of the tutorial: we’d also like to allow our bartender to parse orders from people who’re a bit too inebriated to use the introductory article: “glass of wine” (no a) should also be acceptable.

If you’re interested, the code is on GitHub, in my grappaexample repository.

Let’s take a look at our bartender‘s source code, just to set the stage for our grammar. (Actually, we’ll be writing multiple grammars, because we want to take it in small pieces.)

package com.autumncode.bartender;

import com.github.fge.grappa.Grappa;
import com.github.fge.grappa.run.ListeningParseRunner;
import com.github.fge.grappa.run.ParsingResult;

import java.util.Scanner;

public class Bartender {
    public static void main(String[] args) {
        new Bartender().run();
    }

    public void run() {
        final Scanner scanner = new Scanner(System.in);
        boolean done = false;
        do {
            writePrompt();
            String order = scanner.nextLine();
            done = order == null || handleOrder(order);
        } while (!done);
    }

    private void writePrompt() {
        System.out.print("What're ya havin'? ");
        System.out.flush();
    }

    private boolean handleOrder(String order) {
        DrinkOrderParser parser
                = Grappa.createParser(DrinkOrderParser.class);
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(parser.DRINKORDER());
        ParsingResult<DrinkOrder> result = runner.run(order);
        DrinkOrder drinkOrder;
        boolean done = false;
        if (result.isSuccess()) {
            drinkOrder = result.getTopStackValue();
            done = drinkOrder.isTerminal();
            if (!done) {
                System.out.printf("Here's your %s of %s. Please drink responsibly!%n",
                        drinkOrder.getVessel().toString().toLowerCase(),
                        drinkOrder.getDescription());
            }
        } else {
            System.out.println("I'm sorry, I don't understand. Try again?");
        }
        return done;
    }
}

This isn’t the world’s greatest command line application, but it serves to get the job done. We don’t have to worry about handleOrder yet – we’ll explain it as we go through generating a grammar.

What it does

Grappa describes a grammar as a set of Rules. A rule can describe a match or an action; both matches and actions return boolean values to indicate success. A rule has failed when processing sees false in its stream.

Let’s generate a very small parser for the sake of example. Our first parser (ArticleParser) is going to do nothing other than detect an article – a word like “a”, “an”, or “the.”

Actually, those are all of the articles in English – there are other forms of articles, but English has those three and no others as examples of articles.

The way you interact with a parser is pretty simple. The grammar itself can extend BaseParser<T>, where T represents the output from the parser; you can use Void to indicate that the parser doesn’t have any output internally.

Therefore, our ArticleParser‘s declaration will be:

public class ArticleParser extends BaseParser<Void> {

We need to add a Rule to our parser, so that we can define an entry point from which the parser should begin. As a first stab, we’ll create a Rule called article(), that tries to match one of our words. With a trie. It’s cute that way.

A trie is a type of radix tree. They tend to be super-fast at certain kinds of classifications. Note that this method name may change in later versions of Grappa, because honestly, the actual search mechanism – the trie – isn’t important for the purpose of invoking the method.

public class ArticleParser extends BaseParser<Void> {
    public Rule article() {
        return trieIgnoreCase("a", "an", "the");
    }
}

This rule should match any variant of “a”, “A”, “an”, “tHe”, or anything like that – while not matching any text that doesn’t somehow fit in as an article. Let’s write a test that demonstrates this, using TestNG so we can use data providers:

package com.autumncode.bartender;

import com.github.fge.grappa.Grappa;
import com.github.fge.grappa.run.ListeningParseRunner;
import com.github.fge.grappa.run.ParsingResult;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import static org.testng.Assert.assertEquals;

public class ArticleTest {
    @DataProvider
    Object[][] articleData() {
        return new Object[][]{
                {"a", true},
                {"an", true},
                {"the", true},
                {"me", false},
                {"THE", true},
                {" a", false},
                {"a ", true},
                {"afoo", true},
        };
    }

    @Test(dataProvider = "articleData")
    public void testOnlyArticle(String article, boolean status) {
        ArticleParser parser = Grappa.createParser(ArticleParser.class);
        testArticleGrammar(article, status, parser.article());
    }
    
    private void testArticleGrammar(String article, boolean status, Rule rule) {
        ListeningParseRunner<Void> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<Void> articleResult = runner.run(article);
        assertEquals(articleResult.isSuccess(), status,
                "failed check on " + article + ", parse result was "
                        + articleResult + " and expected " + status);
    }
}

So what is happening here?

First, we create a global (for the test) ArticleParser instance through Grappa. Then we create a ListeningParseRunner, with the entry point to the grammar as a parameter; this builds the internal model for the parser (stuff we don’t really care about, but it is memoized, so we can use that code over and over again without incurring the time it takes for processing the grammar at runtime.)

I used a utility method because the form of the tests themselves doesn’t change – only the inputs and the rules being applied. As we add more to our grammar, this will allow us to run similar tests with different inputs, results, and rules.

After we’ve constructed our Parser’s Runner, we do something completely surprising: we run it, with ParsingResult<Void> articleResult = runner.run(article);. Adding in the TestNG data provider, this means we’re calling our parser with every one of those articles as a test, and checking to see if the parser’s validity – shown by articleResult.isSuccess() – matches what we expect.

In most cases, it’s pretty straightforward, since we are indeed passing in valid articles. Where we’re not, the parser says that it’s not a successful parse, such as when we pass it me.

There are three cases where the result might be surprising: " a", "a ", and "afoo". The whitespace is significant, for the parser; for our test, the article with a trailing space passes validation, as does “afoo“, while the article with the leading space does not.

The leading space is easy: our parser doesn’t consume any whitespace, and Grappa assumes whitespace is significant unless told otherwise (by Rules, of course.) So that space doesn’t match our article; it fails to parse, because of that.

However, the trailing space (and "afoo") is a little more odd. What’s happening there is that Grappa is parsing as much of the input as is necessary to fulfill the grammar; once it’s finished doing that, it doesn’t care about anything that follows the grammar. So once it matches the initial text – the “a” – it doesn’t care what the rest of the content is. It’s not significant that "foo" follows the “a“; it matches the “a” and it’s done.

We can fix that, of course, by specifying a better rule – one that includes a terminal condition. That introduces a core concept for Grappa, the “sequence().” (This will factor very heavily into our grammar when we add the ability to say “please” at the end of the tutorial.)

Author’s note: I use “terminal” to mean “ending.” So a terminal anything is meant to indicate finality. However, a “terminal” is also used to describe something that doesn’t delegate to anything else, in Grappa’s terms. So for Grappa, the use of “terminal” might not be the same as my use of the word “terminal”.

Let’s expand our ArticleParser a little more. Now it looks like:

package com.autumncode.bartender;

import com.github.fge.grappa.parsers.BaseParser;
import com.github.fge.grappa.rules.Rule;

public class ArticleParser extends BaseParser<Void> {
    public Rule article() {
        return trieIgnoreCase("a", "an", "the");
    }

    public Rule articleTerminal() {
        return sequence(
                article(),
                EOI
        );
    }
}

What we’ve done is added a new Rule – articleTerminal() – which contains a sequence. That sequence is “an article” — which consumes “a,” “an”, or “the” – and then the special EOI rule, which stands for “end of input.” That means that our simple article grammar won’t consume leading or trailing spaces – the grammar will fail if any content exists besides our article.

We can show that with a new test:

@DataProvider
Object[][] articleTerminalData() {
    return new Object[][]{
         {"a", true},
         {"an", true},
         {"the", true},
         {"me", false},
         {"THE", true},
         {" a", false},
         {"a ", false},
         {"afoo", false},
    };
}

@Test(dataProvider = "articleTerminalData")
public void testArticleTerminal(String article, boolean status) {
    ArticleParser parser = Grappa.createParser(ArticleParser.class);
    testArticleGrammar(article, status, parser.articleTerminal());
}

Now our test performs as we’d expect: it matches the article, and only the article – as soon as it has a single article, it expects the end of input. If it doesn’t find that sequence exactly, it fails to match and isSuccess() returns false.

It’s not really very kind for us to not accept whitespace, though: we probably want to parse " a " as a valid article, but not " a the" or anything like that.

It shouldn’t be very surprising that we can use sequence() for that, too, along with a few new rules from Grappa itself. Here’s our Rule for articles with surrounding whitespace:

public Rule articleWithWhitespace() {
    return sequence(
            zeroOrMore(wsp()),
            article(),
            zeroOrMore(wsp()),
            EOI
    );
}

What we’ve done is added two extra parsing rules, around our article() rule: zeroOrMore(wsp()). The wsp() rule matches whitespace – spaces and tabs, for example. The zeroOrMore() rule seems faintly self-explanatory, but just in case: it says “this rule will match if zero or more of the contained rules match.”

Therefore, our new rule will match however much whitespace we have before an article, then the article, and then any whitespace after the article – but nothing else. That’s fun to say, I guess, but it’s a lot more fun to show:

 @DataProvider
 Object[][] articleWithWhitespaceData() {
     return new Object[][]{
             {"a", true},
             {"a      ", true},
             {"     the", true},
             {"me", false},
             {" THE ", true},
             {" a an the ", false},
             {"afoo", false},
     };
 }

 @Test(dataProvider = "articleWithWhitespaceData")
 public void testArticleWithWhitespace(String article, boolean status) {
     ArticleParser parser = Grappa.createParser(ArticleParser.class);
     testArticleGrammar(article, status, parser.articleWithWhitespace());
 }

Believe it or not, we’re actually most of the way to being able to build our full drink order parser – we need to figure out how to get data from our parser (hint: it’s related to that <Void> in the parser’s declaration), but that’s actually the greatest burden we have remaining.

One other thing that’s worth noting as we go: our code so far actually runs twenty-three tests. On my development platform, it takes 64 milliseconds to run all twenty-three – the first one takes 49, where it’s building the parser for the first time. The rest take somewhere between 0 and 4 milliseconds – and I’m pretty sure that 4ms reading is an outlier. Our grammar isn’t complex, and I imagine we could write something without a grammar that would be faster – maybe HashSet<String>.contains(input.trim()) – but we’re about to step into territory that would end up being a lot less maintainable as our grammar grows.

I ran the tests one hundred times each and the same pattern showed up: every now and then you’d see a test that ran slower. My initial guess is that this is related to garbage collection or some other housekeeping chore on the JVM’s part, but I haven’t verified it.)

Getting Data out of our Parser

Grappa uses an internal stack of values to track and expose information. We can tell it the type of the stack values – and in fact, we already did so in our ArticleParser. It’s the <Void> we used – that says that we have a stack of Void values, which is a cute way of saying “no value at all.” (If you remember carefully, we pointed that out when we first started describing the ArticleParser – this is where that information is useful!)

Therefore, all we need to do is expose a type, and then manipulate that stack of values. We do so with a special type of Rule, a function that returns a boolean that indicates whether the Rule was successful.

Our goal with the article is to parse drink orders, of the general form of “a VESSEL of DRINK.” We already worked on a parser that demonstrates parsing the “a” there – it’s time to think about parsing the next term, which we’ll call a “vessel.” Or, since we’re using Java, a Vessel – which we’ll encapsulate in an Enum so we can easily add Vessels.

The Vessel itself is pretty simple:

package com.autumncode.bartender;

public enum Vessel {
    PINT,
    BOWL,
    GLASS,
    CUP,
    PITCHER,
    MAGNUM,
    BOTTLE,
    SPOON
}

What we want to do is create a parser such that we can hand it “a glass” and get Vessel.GLASS out of it.

Given that we’ve said that a parser can be constructed with the “return type”, that tells us that our VesselParser wants to extend BaseParser<Vessel>, and so it does. In fact, our VesselParser isn’t even very surprising, given what we’ve learned from our ArticleParser:

public class VesselParser extends BaseParser<Vessel> {
    static final Collection<String> vessels = Stream
            .of(Vessel.values())
            .map(Enum::name)
            .collect(Collectors.toList());

    public Rule vessel() {
        return trieIgnoreCase(vessels);
    }
}

What does this do? Well, most of it is building a List of the Vessel values, by extracting the values from Vessel. It’s marked static final so it will only initialize that List once; the Rule (vessel()) simply uses the exact same technique we used in parsing articles. It doesn’t actually do anything with the match, though. It would simply fail if it was handed text that did not match a Vessel type.

Incidentally, the Java Language Specification suggests the order of static final, in section 8.3.1, Field Modifiers.

Let’s try it out, using the same sort of generalized pattern we saw in our ArticleParser tests. (We’re going to add a new generalized test method, when we add in the type that should be returned, but this will do for now.)

public class VesselTest {
    private void testGrammar(String corpus, boolean status, Rule rule) {
        ListeningParseRunner<Vessel> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<Vessel> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
    }
    
    @DataProvider
    Object[][] simpleVesselParseData() {
        return new Object[][]{
                {Vessel.PINT.name(), true,},
                {Vessel.BOWL.name(), true,},
                {Vessel.GLASS.name(), true,},
                {Vessel.CUP.name(), true,},
                {Vessel.PITCHER.name(), true,},
                {Vessel.MAGNUM.name(), true,},
                {Vessel.BOTTLE.name(), true,},
                {Vessel.SPOON.name(), true,},
                {"hatful", false,},
        };
    }

    @Test(dataProvider = "simpleVesselParseData")
    public void testSimpleVesselParse(String corpus, boolean valid) {
        VesselParser parser = Grappa.createParser(VesselParser.class);
        testGrammar(corpus, valid, parser.vessel());
    }
}

The idiom that Grappa uses – and that I will use, in any event – involves the use of the push() and match() methods.

Basically, when we match a Vessel – using that handy vessel() rule – what we will do is push() a value corresponding to the Vessel whose name corresponds to the Rule we just wrote. We can get the text of the Rule we just matched, with the rather-handily-named match() method.

It’s actually simpler to program than it is to describe:

 // in VesselParser.java
 public Rule VESSEL() {
     return sequence(
             vessel(), 
             push(Vessel.valueOf(match().toUpperCase()))
     );
 }

This is a rule that encapsulates the matching of the vessel name – thus, vessel() – and then, assuming the match is found, calls push() with the Vessel whose text is held in match().

That’s fine to say, but much better to show. Here’s a test of our VESSEL() rule, following the same sort of generalized pattern we saw for parsing articles, along with a new generalized test runner that examines the returned value if the input data is valid according to the grammar:

private void testGrammarResult(String corpus, boolean status, Vessel value, Rule rule) {
    ListeningParseRunner<Vessel> runner
            = new ListeningParseRunner<>(rule);
    ParsingResult<Vessel> result = runner.run(corpus);
    assertEquals(result.isSuccess(), status,
            "failed check on " + corpus + ", parse result was "
                    + result + " and expected " + status);
    if(result.isSuccess()) {
        assertEquals(result.getTopStackValue(), value);
    }
}

@DataProvider
Object[][] simpleVesselReturnData() {
    return new Object[][]{
            {Vessel.PINT.name(), true, Vessel.PINT},
            {Vessel.BOWL.name(), true, Vessel.BOWL},
            {Vessel.GLASS.name(), true, Vessel.GLASS},
            {Vessel.CUP.name(), true, Vessel.CUP},
            {Vessel.PITCHER.name(), true, Vessel.PITCHER},
            {Vessel.MAGNUM.name(), true, Vessel.MAGNUM},
            {Vessel.BOTTLE.name(), true, Vessel.BOTTLE},
            {Vessel.SPOON.name(), true, Vessel.SPOON},
            {"hatful", false, null},
    };
}

@Test(dataProvider = "simpleVesselReturnData")
public void testSimpleVesselResult(String corpus, boolean valid, Vessel value) {
    VesselParser parser = Grappa.createParser(VesselParser.class);
    testGrammarResult(corpus, valid, value, parser.VESSEL());
}

Note that we’re testing with a Rule of parser.VESSEL() – the one that simply matches a vessel name is named parser.vessel(), and the one that updates the parser’s value stack is parser.VESSEL().

This is a personal idiom. I reserve the right to change my mind if sanity demands it. In fact, I predict that I will have done just this by the end of this article.

So what this does is very similar to our prior test – except it also tests the value on the parser’s stack (accessed via
result.getTopStackValue()) against the value that our DataProvider says should be returned, as long as the parse was expected to be valid.

All this is well and good – we can hand it "glass" and get Vessel.GLASS — but we haven’t fulfilled everything we want out of a VesselParser. We want to be able to ask for " a pint " — note the whitespace! — and get Vessel.PINT. We need to add in our article parsing.

First, let’s write our tests, so we know when we’re done:

@DataProvider
Object[][] articleVesselReturnData() {
    return new Object[][]{
            {"a pint", true, Vessel.PINT},
            {"the bowl", true, Vessel.BOWL},
            {"  an GLASS", true, Vessel.GLASS},
            {"a     cup", true, Vessel.CUP},
            {"the pitcher    ", true, Vessel.PITCHER},
            {" a an magnum", false, null},
            {"bottle", true, Vessel.BOTTLE},
            {"spoon   ", true, Vessel.SPOON},
            {"spoon  bottle ", false, null},
            {"hatful", false, null},
            {"the stein", false, null},
    };
}

@Test(dataProvider = "articleVesselReturnData")
public void testArticleVesselResult(String corpus, boolean valid, Vessel value) {
    VesselParser parser = Grappa.createParser(VesselParser.class);
    testGrammarResult(corpus, valid, value, parser.ARTICLEVESSEL());
}

Our tests should be able to ignore the leading article and any whitespace. Any wrongful formation (as you see in " a an magnum") should fail, and any vessel type that isn’t valid ("hatful" and "the stein") should fail.

Our Rule is going to look like a monster, because it has to handle a set of possibilities, but it’s actually pretty simple. Let’s take a look, then walk through the grammar:

public Rule article() {
    return trieIgnoreCase("a", "an", "the");
}

public Rule ARTICLEVESSEL() {
    return sequence(
            zeroOrMore(wsp()),
            optional(
                    sequence(
                            article(),
                            oneOrMore(wsp())
                    )),
            VESSEL(),
            zeroOrMore(wsp()),
            EOI);
}

First, we added our article() Rule, from our ArticleParser. It might be tempting to copy all the whitespace handling from that parser as well, but we shouldn’t – all we care about is the articles themselves (“lexemes,” if we’re trying to look all nerdy.)

It’s the ARTICLEVESSEL() Rule that’s fascinating. What that is describing is a sequence, consisting of:

  • Perhaps some whitespace, expressed as zeroOrMore(wsp()).
  • An optional sequence, consisting of:
    • An article.
    • At least one whitespace character.
  • A vessel (which, since we’re using VESSEL(), means the parser’s stack is updated.)
  • Perhaps some whitespace.
  • The end of input.

Any input that doesn’t follow that exact sequence ("spoon bottle", for example) fails.

Believe it or not, we’re now very much on the downhill slide for our bar-tending program.

We need to add a preposition (“of”) and then generalized text handling for the type of drink, and we need to add the container type – but of this, only the type of drink will add any actual complexity to our parser.

Rounding out the Bartender

Our VesselParser is actually a pretty good model for the DrinkOrderParser that our Bartender will use. What we need to add is matching for two extra tokens: “of,” as mentioned, and then a generalized description of a drink.

We’re not going to be picky about the description; we could validate it (just like we’ve done for Vessel) but there are actual better lessons to be gained by leaving it free-form.

Let’s take a look at the operative part of Bartender again, which will set the stage for the full parser.

DrinkOrderParser parser
        = Grappa.createParser(DrinkOrderParser.class);
ListeningParseRunner<DrinkOrder> runner
        = new ListeningParseRunner<>(parser.DRINKORDER());
ParsingResult<DrinkOrder> result = runner.run(order);
DrinkOrder drinkOrder;
boolean done = false;
if (result.isSuccess()) {
    drinkOrder = result.getTopStackValue();
    done = drinkOrder.isTerminal();
    if (!done) {
        System.out.printf("Here's your %s of %s. Please drink responsibly!%n",
                drinkOrder.getVessel().toString().toLowerCase(),
                drinkOrder.getDescription());
    }
} else {
    System.out.println("I'm sorry, I don't understand. Try again?");
}
return done;

The very first thing we’re going to do is create a DrinkOrder class, that contains the information about our drink order.

public class DrinkOrder {
    Vessel vessel;
    String description;
    boolean terminal;
}

I’m actually using Lombok in the project (and the @Data annotation) but for the sake of example, imagine that we have the standard boilerplate accessors and mutators for each of those attributes. Thus, we can call setDescription(), et al, even though we’re not showing that code. We’re also going to have equals() and hashCode() created (via Lombok), as well as a no-argument constructor and another constructor for all properties.

In other words, it’s a fairly standard Javabean, but we’re not showing all of the boilerplate code – and thanks to Lombok, we don’t even need the boilerplate code. Lombok makes it for us.

If you do need the code for equals(), hashCode(), toString(), or the mutators, accessors, and constructors shown, you may be reading the wrong tutorial. How did you make it this far?

Before we dig into the parser – which has only one really interesting addition to the things we’ve seen so far – let’s take a look at our test. This is the full test, so it’s longer than some of our code has been. The DrinkOrderParser will be much longer.

public class DrinkOrderParserTest {
    private void testGrammarResult(String corpus, boolean status, DrinkOrder value, Rule rule) {
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<DrinkOrder> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
        if (result.isSuccess()) {
            assertEquals(result.getTopStackValue(), value);
        }
    }
    
    @DataProvider
    public Object[][] drinkOrderProvider() {
        return new Object[][]{
                {"a glass of water", true, new DrinkOrder(Vessel.GLASS, "water", false)},
                {"a pitcher of old 66", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a    pint  of duck   vomit   ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a shoeful of motor oil", false, null},
                {"nothing", true, new DrinkOrder(null, null, true)},
        };
    }
    
    @Test(dataProvider = "drinkOrderProvider")
    public void testDrinkOrderParser(String corpus, boolean valid, DrinkOrder result) {
        DrinkOrderParser parser = Grappa.createParser(DrinkOrderParser.class);
        testGrammarResult(corpus, valid, result, parser.DRINKORDER());
    }
}

Most of this should be fairly simple; it’s the same pattern we’ve seen used in our other tests.

I don’t actually drink, myself, so… I keep imagining some biker bar in the American southwest selling a beer called “Old 66,” and in my imagination “duck vomit” is the kind of wine that comes in a resealable plastic bag.

A lot of the DrinkOrderParser will be very familiar. Let’s dive in and take a look at all of it and then break it down:

public class DrinkOrderParser extends BaseParser<DrinkOrder> {
    Collection<String> vessels = Stream
            .of(Vessel.values())
            .map(Enum::name)
            .collect(Collectors.toList());

    public boolean assignDrink() {
        peek().setDescription(match().toLowerCase().replaceAll("\\s+", " "));
        return true;
    }

    public boolean assignVessel() {
        peek().setVessel(Vessel.valueOf(match().toUpperCase()));
        return true;
    }

    public boolean setTerminal() {
        peek().setTerminal(true);
        return true;
    }

    public Rule ARTICLE() {
        return trieIgnoreCase("a", "an", "the");
    }

    public Rule OF() {
        return ignoreCase("of");
    }

    public Rule NOTHING() {
        return sequence(
                trieIgnoreCase("nothing", "nada", "zilch", "done"),
                EOI,
                setTerminal()
        );
    }

    public Rule VESSEL() {
        return sequence(
                trieIgnoreCase(vessels),
                assignVessel()
        );
    }

    public Rule DRINK() {
        return sequence(
                join(oneOrMore(firstOf(alpha(), digit())))
                        .using(oneOrMore(wsp()))
                        .min(1),
                assignDrink()
        );
    }

    public Rule DRINKORDER() {
        return sequence(
                push(new DrinkOrder()),
                zeroOrMore(wsp()),
                firstOf(
                        NOTHING(),
                        sequence(
                                optional
                                        ARTICLE(),
                                        oneOrMore(wsp())
                                ),
                                VESSEL(),
                                oneOrMore(wsp()),
                                OF(),
                                oneOrMore(wsp()),
                                DRINK()
                        )
                ),
                zeroOrMore(wsp()),
                EOI
        );
    }
}

We’re reusing the mechanism for creating a collection of Vessel references. We’re also repeating the Rule used to detect an article.

We’re adding a Rule for the detection of the preposition “of”, which is a mandatory element in our grammar. We use ignoreCase(), because we respect the rights of drunkards to shout at their barkeeps:

public Rule OF() {
    return ignoreCase("of");
}

Note how I’m skirting my own rule about naming. I said I was reserving the right to change my mind, and apparently I’ve done so even while writing this article. According to the naming convention I described earlier, it should be of() and not OF() because it doesn’t alter the parser’s stack. The same rule applies to ARTICLE(). It’s my content, I’ll write it how I want to unless I decide to fix it later.

I’m also creating methods to mutate the parser state:

protected boolean assignDrink() {
    peek().setDescription(match().toLowerCase().replaceAll("\\s+", " "));
    return true;
}

protected boolean assignVessel() {
    peek().setVessel(Vessel.valueOf(match().toUpperCase()));
    return true;
}

protected boolean setTerminal() {
    peek().setTerminal(true);
    return true;
}

These are a little interesting, in that they use peek(). The actual base rule in our grammar is DRINKORDER(), which immediately pushes a DrinkOrder reference onto the parser stack. That means that there is a DrinkOrder that other rules can modify at will; peek() gives us that reference. Since it’s typed via Java’s generics, we can call any method that DrinkOrder exposes.

These utilities all return true. None of them can fail, because they won’t be called unless a prior rule has matched; these methods are for convenience only. Actually, let’s show the NOTHING() and VESSEL() rules, so we can see how these methods are invoked:

public Rule NOTHING() {
    return sequence(
            trieIgnoreCase("nothing", "nada", "zilch", "done"),
            EOI,
            setTerminal(),
    );
}

public Rule VESSEL() {
    return sequence(
            trieIgnoreCase(vessels),
            assignVessel()
    );
}

This leaves two new rules to explain: DRINK() and DRINKORDER(). Here’s DRINK():

public Rule DRINK() {
    return sequence(
            join(oneOrMore(firstOf(alpha(), digit())))
                    .using(oneOrMore(wsp()))
                    .min(1),
            assignDrink()
    );
}

This rule basically builds a list of words. It’s a sequence of operations; the first builds the match of the words, and the second operation assigns the matched content to the DrinkOrder‘s description.

The match of the words is really just a sequence of alphanumeric characters. It requires at least one such sequence to exist, but will consume as many as there are in the input.

Now for the Rule that does most of the work: DRINKORDER().

public Rule DRINKORDER() {
    return sequence(
            push(new DrinkOrder()),
            zeroOrMore(wsp()),
            firstOf(
                    NOTHING(),
                    sequence(
                            optional(sequence(
                                    ARTICLE(),
                                    oneOrMore(wsp())
                            )),
                            VESSEL(),
                            oneOrMore(wsp()),
                            OF(),
                            oneOrMore(wsp()),
                            DRINK()
                    )
            ),
            zeroOrMore(wsp()),
            EOI
    );
}

Again, we have a sequence. It works something like this:

  • First, push a new DrinkOrder onto the stack, to keep track of our order’s state.
  • Consume any leading whitespace.
  • Either:
    • Check for the terminal condition (“nothing”, for example), or
    • Check for a new sequence, of the following form:
      • An optional sequence:
        • An article
        • Any trailing whitespace after the article
      • A vessel
      • One or more whitespace characters
      • The lexeme matching “of”
      • One or more whitespace characters
      • The drink description
  • Any trailing whitespace
  • The end of input

We’ve basically built most of this through our parsers, bit by bit; armed with the ability to peek() and push(), we can build some incredibly flexible parsers with fairly simple code.

Adding Politesse

All this has been great, so far. We can actually “order” from a Bartender, giving us this scintillating conversation:

$ java -jar bartender-1.0-SNAPSHOT.jar
What're ya havin'? a glass of water
Here's your glass of water. Please drink responsibly!
What're ya havin'? a toeful of shoe polish
I'm sorry, I don't understand. Try again?
What're ya havin'? a pint of indigo ink
Here's your pint of indigo ink. Please drink responsibly!
What're ya havin'? nothing
$

The only problem is that it’s not very humane or polite. We can’t say “please,” we can’t be very flexible. What we need is to add politesse to our grammar.

What we really want to do is modify our DrinkOrderParser so that we can ask for “a cup of pinot noir, 1986 vintage, please?” It should be able to tell us that we’ve ordered “pinot noir, 1986” and not “pinot noir, 1986, please?” — that’d be silly.

However, we need to alter our grammar in some core ways – particularly in how we match the drink names — and use a new Rule, testNot. First, though, let’s take a look at our test code, because that’s going to give us a workable indicator of whether we’ve succeeded or not.

public class PoliteDrinkOrderParserTest {
    private void testGrammarResult(String corpus, boolean status, DrinkOrder value, Rule rule) {
        ListeningParseRunner<DrinkOrder> runner
                = new ListeningParseRunner<>(rule);
        ParsingResult<DrinkOrder> result = runner.run(corpus);
        assertEquals(result.isSuccess(), status,
                "failed check on " + corpus + ", parse result was "
                        + result + " and expected " + status);
        if (result.isSuccess()) {
            assertEquals(result.getTopStackValue(), value);
        }
    }

    @DataProvider
    public Object[][] drinkOrderProvider() {
        return new Object[][]{
                {"a glass of water please", true, new DrinkOrder(Vessel.GLASS, "water", false)},
                {"a pitcher of old 66, please", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a pitcher of old 66", true, new DrinkOrder(Vessel.PITCHER, "old 66", false)},
                {"a glass of pinot noir, 1986", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"a glass of pinot noir, 1986, ok?", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"glass of pinot noir, 1986, ok?", true, new DrinkOrder(Vessel.GLASS, "pinot noir, 1986", false)},
                {"cup , pinot noir, 1986 vintage, ok?", true, new DrinkOrder(Vessel.CUP, "pinot noir, 1986 vintage", false)},
                {"cup,pinot noir, 1986,ok!", true, new DrinkOrder(Vessel.CUP, "pinot noir, 1986", false)},
                {"a    pint  of duck   vomit   ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a    pint  of duck   vomit  , please ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {" pint , duck   vomit please  ", true, new DrinkOrder(Vessel.PINT, "duck vomit", false)},
                {"a shoeful of motor oil", false, null},
                {"nothing", true, new DrinkOrder(null, null, true)},
        };
    }

    @Test(dataProvider = "drinkOrderProvider")
    public void testDrinkOrderParser(String corpus, boolean valid, DrinkOrder result) {
        PoliteDrinkOrderParser parser = Grappa.createParser(PoliteDrinkOrderParser.class);
        testGrammarResult(corpus, valid, result, parser.ORDER());
    }
}

You’ll notice that we’ve changed some other things, too. Our original grammar was pretty simple in formal-ish terms:

DRINKORDER ::= nothing | article? vessel `of` drink
article ::= a | an | the
vessel ::= pint | bowl | glass | cup | pitcher | magnum | bottle | spoon
drink ::= [a-zA-Z0-9]*
nothing ::= nothing | nada | zilch | done

Note that this isn’t an actual formal grammar – I’m cheating. It just looks as if it might be something near formal, with a particular failure in the “drink” term.

Our new one seems to be more flexible:

DRINKORDER ::= (nothing | article? vessel of drink) interjection? eos?
article ::= a | an | the
vessel ::= pint | bowl | glass | cup | pitcher | magnum | bottle | spoon
of ::= , | of
drink ::= !interjection
interjection ::= ','? please | ok | okay | pls | yo 
eos ::= '.' | '!' | '?'
nothing ::= nothing | nada | zilch | done

Here, we’re no more actually formal than we were – the “!interjection” is trying to say that a drink is everything where a drink would be appropriate, up to the interjection.

I don’t care for Backus-Naur form, and I’m using something that looks like it because I thought it might help. Your mileage may vary as to whether I was correct or not.

At any rate, our new grammar should allow us to say “please” and eliminate the unnecessary “of” – although I’m not willing to concede that a bartender should respond well to “pint beer.” “pint, beer.” I can accept – but that comma is significant, by golly.

I’ll leave it as an exercise for the reader to make the comma not necessary – and to write the test that proves it.

However, one thing remains: we haven’t seen our grammar. Most of it’s the same: the article, the vessel, and the action rules (the things that construct our returned drink order) haven’t changed, but we have a slew of new rules (for the end of the sentence and the interjection) and we’ve modified some old ones (drink, and of). Let’s take a look at the changes held in PoliteDrinkOrderParser:

public Rule OF() {
    return firstOf(
            sequence(
                    zeroOrMore(wsp()),
                    COMMA(),
                    zeroOrMore(wsp())
            ),
            sequence(
                    oneOrMore(wsp()),
                    ignoreCase("of"),
                    oneOrMore(wsp())
            )
    );
}

public Rule DRINK() {
    return sequence(
            oneOrMore(
                    testNot(INTERJECTION()),
                    ANY
            ),
            assignDrink());
}

public Rule DRINKORDER() {
    return sequence(
            optional(sequence(
                    ARTICLE(),
                    oneOrMore(wsp())
            )),
            VESSEL(),
            OF(),
            DRINK()
    );
}

public Rule COMMA() {
    return ch(',');
}

public Rule INTERJECTION() {
    return sequence(
            zeroOrMore(wsp()),
            optional(COMMA()),
            zeroOrMore(wsp()),
            trieIgnoreCase("please", "pls", "okay", "yo", "ok"),
            TERMINAL()
    );
}

public Rule EOS() {
    return anyOf(".!?");
}

public Rule TERMINAL() {
    return sequence(zeroOrMore(wsp()),
            optional(EOS()),
            zeroOrMore(wsp()),
            EOI
    );
}

public Rule ORDER() {
    return sequence(
            push(new DrinkOrder()),
            zeroOrMore(wsp()),
            firstOf(DRINKORDER(), NOTHING()),
            optional(INTERJECTION()),
            TERMINAL()
    );
}

We’ve had to move whitespace handling around a little, too, because of the use of OF() to serve as a connector rather than the simple word “of.”

OF() now has to serve as a syntax rule for a single comma – with no whitespace – as you’d see in the string “pint,beer“. It also has to handle whitespace – as you’d find in pint , beer.

However, it needs to mandate whitespace for the actual word of – because pintofbeer doesn’t work.

Another exercise for the reader: fix OF() to handle “pint, of beer“.

DRINK() has a new sequenceoneOrMore(testNot(INTERJECTION()), ANY).

Pay attention to this.

This means to match everything (as per the ANY) that does not match the INTERJECTION() rule. The sequence order is important – it tries to match the rules in order, so it checks the tokens (by looking ahead) against INTERJECTION() first, and failing that check (and therefore succeeding in the match – remember, we’re looking for something that is not an INTERJECTION()) it checks to see if the text matches ANY.

Given that ANY matches anything, it succeeds – as long as the tokens are not tokens that match the INTERJECTION() rule.

And what does INTERJECTION() look like? Well, it’s a normal rule – this is where Grappa really shines. Our INTERJECTION() has optional whitespace and punctuation, and it’s own case insensitive matching:

public Rule INTERJECTION() {
    return sequence(
            zeroOrMore(wsp()),
            optional(COMMA()),
            zeroOrMore(wsp()),
            trieIgnoreCase("please", "pls", "okay", "yo", "ok"),
            TERMINAL()
    );
}

It also has the terminal condition for the order, because something might look like an interjection but wouldn’t be. Consider this input: “glass,water,please fine.” The ,please matches an INTERJECTION(), but because the INTERJECTION() includes the TERMINAL() rule – which means “optional whitespace, an optional end-of-sentence, optional whitespace, and then a definite end-of-input” – “,please fine” fails the INTERJECTION() match, and falls back to ANY.

EOI can match legally several times. That’s why we can match it in our INTERJECTION() rule while still having it match the end of our ORDER() rule. The nature of TERMINAL() – being a series of optional elements – means that if it’s matched as part of INTERJECTION() it won’t match at the end of ORDER(). Such is life.

We can also order something like this: “glass,water, please ok?” — and our drink would be a glass of “water, please” because “ok” would match the INTERJECTION() rule.

Our bartender’s a great guy, but he likes his grammar.

Our PoliteBartender class is different from our Bartender only in the Parser it uses and the originating Rule – and, of course, in the flexibility of the orders it accepts.

$ java -cp bartender-1.0-SNAPSHOT.jar com.autumncode.bartender.PoliteBartender
What're ya havin'? a glass of water
Here's your glass of water. Please drink responsibly!
What're ya havin'? a toeful of shoe polish
I'm sorry, I don't understand. Try again?
What're ya havin'? a pint of indigo ink, please
Here's your pint of indigo ink. Please drink responsibly!
What're ya havin'? A SPOON OF DOM PERIGNON, 1986, OK?
Here's your spoon of dom perignon, 1986. Please drink responsibly!
What're ya havin'? magnum,water,pls, please
Here's your magnum of water,pls. Please drink responsibly!
What're ya havin'? nothing
$

Colophon

By the way, much appreciation goes to the following individuals, who helped me write this in various important ways, and in no particular order:

  • Francis Galiegue, who helped by reviewing the text, by pointing out various errors in my grammars, and by writing Grappa in the first place
  • Chris Brenton, who reviewed (a lot!) and helped me tune the messaging
  • Andreas Kirschbaum, who also reviewed quite a bit for the article, especially in early form