Repost: Playing with Java 8 Streams

I’ve been playing around with some more neural network algorithms lately, which has given me yet another chance to revisit a machine learning library. Since Java 8 is due later this month, I’ve decided it’s time to take the plunge and start using it.

Overall, I don’t think I’m leveraging it whatsoever. I know it has new features, of course - duh - but I am only using a few features of the API, mainly where I discover improvements accidentally.

That’s not very efficient, especially considering how neural networks use lots and lots (and lots) of loops - for which Java 8 offers the Streaming API as a potential improvement, as it turns out.

Thus, I have an ideal opportunity to get my feet wet - in a real way, “in anger,” you might say - with the new Java 8 lambda features, to really kick the tires.

This post is only the start of my explorations; I’m not even going to pretend that it’s groundbreaking. It’s just something I’m writing to save what I’ve done, so I don’t end up forgetting - and if someone reads it and sees something I should have done, well, then I’ll learn.

So: since networks tend to build slices of matrices out of slices of matrices, intersections make sense. Let’s start off with building two lists, and determining the intersection. For a first run, I’ll dump them to stdout.

@Test
public void testIntersectionToStdoutOld() {
    List<Integer> l1 = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    List<Integer> l2 = Arrays.asList(2, 4, 6, 8, 10, 12, 14, 16, 18, 20);
    System.out.println("Intersection: ");
    for (Integer i : l1) {
        if (l2.contains(i)) {
            System.out.println(i);
        }
    }
}

Well, isn’t that exciting… not really. Let’s spruce it up some. Here’s my first stab at the streaming version:

@Test
public void testIntersectionToStdoutStreaming() {
    List<Integer> l1 = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
    List<Integer> l2 = Arrays.asList(2, 4, 6, 8, 10, 12, 14, 16, 18, 20);

    System.out.println("Intersection: ");
    l1.stream().filter(l2::contains).forEach(System.out::println);
}

Is that any more exciting? Hmm, I suppose. It’s shorter; the method references are actually pretty convenient.

I like this, but it’s just shorthand so far; on a source code level, it’s … shorter, but not necessarily more clear because the data types are so simple.

Let’s do better. Let’s create a, um, fictional team. What’s odd about this team is that the communication is representable as a directed graph: messages have to go along certain routes, in specific directions.

We’ll have two types for our data model, an Enum for our team members, and a Connection for the, well, connections between people.

enum PERSON {
    DAVE, GARRETT, DANIEL, RUTH, JASON, CARL, ROBYN, TOM, JOE, KARSTEN
}

class Connection {
    PERSON from;
    PERSON to;

    public Connection(PERSON from, PERSON to) {
        this.from = from;
        this.to = to;
    }
}

Now, let’s seed some collections (Lists) with data. We’ll even do it with the streaming API:

List<Connection> connections;
List<PERSON> community = Arrays.asList(RUTH, JASON, TOM, JOE);

@BeforeMethod
public void setUp() {
    PERSON[][] connectionsSource = new PERSON[][]{
            {DAVE, RUTH},
            {RUTH, DANIEL},
            {RUTH, JOE},
            {GARRETT, JASON},
            {GARRETT, CARL},
            {CARL, JASON},
            {DANIEL, ROBYN},
            {DANIEL, TOM},
            {JASON, ROBYN},
            {CARL, KARSTEN},
            {RUTH, KARSTEN},
            {RUTH, TOM},
            {CARL, DAVE},
            {KARSTEN, RUTH},
    };
    connections = new ArrayList<>();
    Arrays.stream(connectionsSource).
            forEach(data -> connections.add(new Connection(data[0], data[1])));
}

What this tells us is this: Dave can talk to Ruth; Ruth can talk to Daniel, Joe, Tom, and Karsten; Carl can talk to Karsten and Dave, and so forth and so on.

You’ll notice that we have a few names separated off as community. This is a smaller team within the larger group.

Now, let’s see who can talk to them, because scanning the data manually is making me cross.

What we’ll do is create a stream of Connections, and filter the results based on whether the connection’s target is in the community team. We’ll still use stdout for output, just because.

@Test
public void whoDoesTheCommunityTalkTo() {
    connections.stream()
            .filter(c -> community.contains(c.from))
            .forEach(c -> System.out.println(c.from + " talks to " + c.to));
}

Running this gets us some good results:

RUTH talks to DANIEL
RUTH talks to JOE
JASON talks to ROBYN
RUTH talks to KARSTEN
RUTH talks to TOM

However, it’s slightly inbred; we don’t want to see community team members who can talk to other community team members. We can do this by amending our filter (adding “&& !community.contains(c.to)”) or by adding another filter altogether, giving us this:

@Test
public void whoDoesTheCommunityTalkToOutside() {
    System.out.println("community talks outside to...");
    connections.stream()
            .filter(c -> !community.contains(c.to))
            .filter(c -> community.contains(c.from))
            .forEach(c -> System.out.println(c.from + " talks to " + c.to));
}

Hmm. It’s… interesting, I suppose, and I’ll happily admit that my problem definition could use some work, but I still don’t see a massive advantage. In fact, I’d say that it has a disadvantage because it’s harder to think through on first glance. Familiarity might repair that.

Let’s see one last streaming example. Let’s say that I want to know who can connect to Joe (me) in two hops, no more, and no less.

So what I want to do is find every route to myself, where the starting point is connected to someone who can talk to me.

First, here’s some old school Java code to do this:

// should be: karsten, dave
for (Connection start : connections) {
    for (Connection middle : connections) {
        if (middle.to == JOE && middle.from == start.to) {
            System.out.println(start.from + " can reach JOE through "
                + middle.from);
        }
    }
}

Exciting, but not. Now let’s see if streaming can make it better, as I’m thinking of it right now:

connections.stream()
    .filter(c -> c.to == JOE)
    .forEach(middle ->
        connections.stream()
            .filter(start -> start.to == middle.from)
            .forEach(m -> System.out.printf("%s can read JOE through %s%n", 
                m.from, middle.from)));

Now, is this any better?

I don’t know. I think it’s probably harder to screw up, once you get it right.

I can definitely see where, in a neural network, streaming might help the expressiveness of quite a bit of code. The syntax is nice, as well; if you’re just calling a method, the method syntax (“collection.stream().forEach(System.out::println);") makes some things quite nice, I think, even though it’s not complete (or I don’t know how to do something with it, which is more likely.)

Wait, what is it that I'd like to be able to do? Well, look at the "who can the community talk to" filters. Here they are again:

connections.stream()
    .filter(c -> !community.contains(c.to))
    .filter(c -> community.contains(c.from))
    .forEach(c -> System.out.println(c.from + " talks to " + c.to));

I'd love to have some way to say "use this expression instead of the element", such that the filter might look something like:

connections.stream()
    .filter(c.to -> community::contains)
    .filter(c.from -> community::contains)
    .forEach(c -> System.out.println(c.from + " talks to " + c.to));

Again, there might be a way to do this, without jumping through too many hoops (you could always map the value and use *that*, but I'm not sure how you'd get back to the containing object). I just don't know it, and I keep thinking it'd be convenient if they're going to allow me the short way to express the method call in the first place.

But is this desirable, or just “nice,” “neato,” “I’m glad that other languages that have this will stop making fun of Java since it doesn’t?”

So far, I’m leaning towards the latter. But I’m going to keep working on this, because I can see the shadows of something fascinating based on this - but I’m just not there yet.