Repost: Open Source and football

My son has joined a local football team. I try to be an involved dad, so I’m trying my hand at helping to coach.

He’s surprisingly good at defensive end; I’m very proud of him.

However, in their most recent practice (Thursday), something stood out to me.

We were running a half-line; basically the center, guard, and tackle on the offensive line, along with a defensive end, an outside linebacker, and a nose guard for defense. (Also in the mix: a tight end, wide receiver, and the three backs.)

We were doing this because we wanted to run offensive plays with some defense. (The team’s small; we don’t have enough players to run an offense and a defense, and most of our players play both.)

Then something stood out: the defensive players wanted to hit, of course, but we were trying to hold back on that – we wanted the offense to get experience running the plays, and our defense is much farther along.

But then, the defense was asking themselves how they could improve, what they could run on defense outside of a base 4-3.

They started talking. They started asking themselves what the offense might do, and how to counter it.

And they weren’t just asking themselves as position players – they were talking about it.

The tackle was suggesting a tactic for my son, because they were triple-teaming him – which meant my son (the DE) would have a specific lane open to him. But if he was going to use that, the OLB needed to cover the gap my son was abandoning.

It wasn’t the lane assignments that was impressive. It was the feedback loops. They were encouraging each other to try things, to work together, adjusting for their strengths and weaknesses; they were starting to play football instead of just playing positions.

It spread: the safeties got involved, the cornerbacks paid attention and started watching for what they could do, and they started being part of the line’s chatter as well. Huddles stopped being “what you do between plays” and started becoming miniature strategy sessions.

It spread to the offense, too: they noticed the changes, and started asking themselves what they could do to counter it. The linemen started talking about assignments, how to open lanes, and who to team up against. The backs and receivers were becoming part of it.

It was football becoming football – chess with pads.

What stuck out to me was how open it was: the defense was sharing with the offense what was going on, and vice versa. Both offense and defense were participating in the other side of the line of scrimmage, so the offense would run a play, then the defense would tell the offense what they (the offense) could have done.

As a result, you could see the improvement. You could see the camaraderie build. And by the time that particular drill was done, you had a visible improvement among the entire team.

It’s almost like the football team was using open source methodology: openness, discussion, improvement for the whole instead of improvement for the individual.

It was awesome to watch.

The epilogue, of course: they played a game Saturday, and the practice didn’t quite translate to the field. They started slowly, and ended up losing 6-0; the other team scored early on a long pass play. As the game wore on, they got a lot tougher; the defense was stellar, and the offense, well, it’s still got a little way to go.

It’s okay. The team is new, with few of the players having ever played before; the ones who had played, had not played as a team. They’re learning, and I’m learning as well; I’ve never coached before, and thank God that we have a head coach who knows what he’s doing, even if he learned at Maryland and not a good school like Florida State. </humor>

The players can see what’s happening, and we’ve already explained to them that our goal is to have the best candidates for varsity in the league when the season’s done; if we lose every game but achieve that objective, we’re okay.

Of course, if we have done that, chances are our players are good enough to win, and we expect that. However, we’d rather teach them fundamentals that they’d need on a higher level than do whatever we need to do to win a particular game.

The target’s the war, not the battles. And the war is these players’ trip to varsity, in the short term, and these players’ understanding of team in the long run.

Repost: Review, Outies

I’m presently reading Jennifer Pournelle’s “Outies,” via Amazon’s Cloud Reader service. I’m a big fan of her father’s and Larry Niven’s Mote series (consisting of “The Mote in God’s Eye” and “The Gripping Hand“) and I was really looking forward to a new installment… if it lived up to the merit of its predecessors.

It doesn’t.

Continue reading “Repost: Review, Outies”

Repost: Faux artists

You know, I have a problem with non-artists. You know the people of whom I’m thinking; maybe you even admire them.

They’re bands like KISS. They’re storytellers like Stan Lee.

They’re the artists who create immensely popular stuff, perhaps for niche audiences, knowing that it’s perhaps not “high art” but it still appeals to innumerable fans.

Continue reading “Repost: Faux artists”

Repost: Writing prompt: a greeting card and a response

So I’m trying to write more often than I have, and by golly, I’m going to try to use writing prompts if necessary to make it happen.

So here’s today’s:

Write a poem in the disguise of a postcard message. Continue by writing a reply postcard message.
Thinking of you with words so trite
They're not very nice but feel so right
I'll say hello within this page
It's not fair for eggs to be laid in cage.


Thanks for the thought, you lazy git
You picked some rhymes and made them fit.
What can you expect from a card that's free
Better, I'd hope, but it's not to be.

You’re welcome.

Repost: DavidRM’s The Journal

I’ve recently decided to try – really try – journaling. It’s like blogging, but private.

Why private, when I have a blog? For a few reasons, really.

One reason is that it requires a lot less thought to construct decent posts. When I’m writing for a general audience (I.e., more than myself) I want the thoughts to be complete, well-formed, concrete and relatable. I want there to be a valid take-away point, a payload; that means actually working out a point and making sure that I’m actually supporting it somehow.
Continue reading “Repost: DavidRM’s The Journal”

Repost: Actor and object

In drama, every character has two roles in the drama itself, outside of the dramatic role in and of itself: these roles are “actor” and “object.” There’s a lot of term confusion, because in drama an actor is one who plays a role, and in context here it’s the same term but a different interpretation. Let’s see if I can make it easier by shifting genres; I might even come up with new terms as I go.
Continue reading “Repost: Actor and object”

Repost: What framework authors are doing wrong

This is a repost, from an old blog of mine. Originally posted on 9 March, 2006.

I finally figured it out. I’ve been struggling to work out how to explain to various framework authors and supporters why I think what they are telling people is not right, or why the frameworks are flawed, with very limited success.

Today, though, it all crystallized.

Continue reading “Repost: What framework authors are doing wrong”

Repost: Tips on writing articles

As editor of in the past, I had a lot of opportunity to see what worked in online writing and what didn’t. Since it was a core part of my job to be efficient online, I also did a lot of research into various techniques, and while I certainly can’t claim to be an SEO expert of any kind, I can describe some things that can help online writing be more efficient.

This was originally written for, and was linked on the post submissions page; since then, they’ve mangled the formatting of old posts so it’s nearly unreadable, although the .Net version of the article is still formatted properly.

Tips on Writing

Writing articles for the web is a lot like writing anything else, with the main differences being that you have fewer limitations, and thus even more chances to lose your audience.

When you write online, it’s being written for posterity. Thus, write it as well as you can. You want to be very clear, and you want to make sure you’re not assuming things on the part of your audience.

Don’t say things in passing. Be specific. Your audience doesn’t have the time or interest to try to figure out what you meant. Trying to be fancy will only make you less effective as a writer. There’s no issue with having a personality or a style in your writing, but if the dominant feature in your writing is your style (and not the content itself) then you’re going to lose your audience.

Losing your audience is a bad thing.

The first sentence in your article is by far the most important. It needs to communicate why a given article should be read. If that one sentence is not effective, your article will start off with a smaller audience, because you will lose readers immediately.

The first sentence in each paragraph is also important. On the web, readers tend to scan more than read. If you put the topic sentence after the first sentence in the paragraph, chances are good that the readers will not reach it. They will have moved on to the next paragraph, or perhaps they will have stopped reading altogether.

It is all right if the subject of a paragraph cannot be completed in one sentence, but the topic sentence should be enough to communicate what the paragraph means. If the reader wants more explanation, the rest of the paragraph exists to provide it.

Short sentences are good. Long sentences are fun to write, and they are often quite natural for authors, but they are not efficient for web readers to scan.

Don’t use emphasis techniques like bold or italics any more than you absolutely need. If you feel that bold is necessary to make your point, then it’s likely that your sentence or paragraph is organized poorly. Highlights like bold or italics draw the eye long after the bold text is read, and the highlights actually lower comprehension.

Code samples are great, and usually required, but make them clear and complete. References to third party sources are all right, but the readers are best served by full code. Boilerplate code, like accessors and mutators, can be ignored with a comment, but it could be more effective if you used properties directly for simplicity … or, instead of including a long list of accessors and mutators, use Lombok. If you’re using Java, that is.

Remember that writing online is still writing. The writing process rarely lends itself to single drafts. It can happen, but it’s rare, and usually not effective.

For efficiency and good writing style, follow a set of simple tips:

  • Make an outline.
  • If the first sentence of a given paragraph isn’t enough to understand what the paragraph is about, rewrite the paragraph.
  • Make sure the article matches your premise! If the subject of the article is “Object Databases and Efficiency,” don’t spend half your text discussing the failures of relational databases. Talk about object databases and efficiency instead.
  • Make sure your spelling is correct.
  • If you use a word processor that provides grammar checking, you should allow it to suggest changes. Check your reading level, Flesch-Kincaid scores, and other data you can. The average reader wants to read at a sixth-grade level. If you consistently score much higher, your article will be hard to read. On the web, ‘hard to read’ usually means ‘unread.’ (This document, as originally published, received a Flesch-Kincaid grade level score of 7.8, for example. I haven’t checked the revised version, because I’m afraid to.)
  • Have people read your draft, and listen to every suggestion they offer. Don’t offer it only to experts; offer it to willing newbies, too. It’s all right to decide not to use suggestions, but your wider audience is going to think of many of the same things your test audience tells you. Constructive criticism is good, especially if it’s received before the article is published.
  • Avoid parentheticals like the plague (the plague is bad, no matter which plague it is.) If you have to use parentheticals, go ahead, but try hard to not overuse them. (They’re hard to read, and break the flow of text. Plus, they’re annoying.)

What I Do When I Write

I actually mind-map most of the things I write, with Freeplane, an open-source (and free) mind-mapping tool. I then draft a rough map for the article.

My central node is, naturally, the subject. (This article would have “tips on writing.”) Then I create child nodes for the central points that I think I need to make – the things without which I’d decide the article wasn’t worth reading.

This becomes my structure. If I have a child node that isn’t something I have to have, then it’s extraneous to the document, or my subject thesis isn’t correct.

These first vertices are the most important part of writing an article, to me; the rest I can usually figure out as I go, if I have to, because my important points (the second-level nodes of the map) should be clear enough and relevant enough to use as a guide for everything.

I usually fill out another few levels for each of the second level nodes, too, though, because they also should have supporting thoughts associated with them. (Otherwise they’re not supported; they’d better stand on their own, then.)

Then, I draft the article itself. The space each supporting statement should get should be roughly analogous to the size of the corresponding node in the graph; if a node in the graph is very short, yet the text for that node is very long, then perhaps my graph isn’t very complete… or perhaps the section is getting too much emphasis, which is by far the more common case. (Or, alternatively, I’m showing code, which drastically affects the size of a block of text. No way around this, sadly, other than hiding the code or providing a link to an external page, neither of which is effective.)

If it’s stated simply in the mind map, there’s no reason it shouldn’t be stated simply in the final product. (Corollary: state things simply in the mind map.)

Then I reread the article, a lot. I find willing victims as often as I can, and make them read it; usually I get unuseful and generic responses like “it’s good,” which boosts my ego some but, realistically, those responses don’t do much for me or for the article. I’m looking for constructive criticism, questions that come from the reader, things like that.

Remember: it’s a draft. As long as you keep that in mind, barbs thrown at you from readers won’t sting much. If they do, well, sorry – listen to your readers. Maybe you won’t factor in what they say (remember the list of tips earlier in this article?) but there’s usually a reason they think what they think.

Sometimes a point made by a reader emphasizes what you wanted to have happen – maybe you wanted the reader to wonder something, you know? If the comments are in line with what you desired to happen, well, that’s a win. If they’re not, well, that’s why you draft and that’s why you rewrite.

The concept is that a mapped article, if it’s able to be graphed properly, will naturally have a better, more cohesive structure for your readers, reducing signal-to-noise and guiding the author’s efforts. A draft process, detached from your ego, is designed to make sure that you aren’t using your writing as pure ego, which makes it more broadly appealing and useful.


I’ve spent a long time learning how to write well online, with varying results. I, too, am ego-driven, after all. However, I love to read well-written stuff, and as an editor for online and dead tree material, I’ve learned what works and what doesn’t. Hopefully you will find these tips and processes useful; I’d love to hear about alternative processes, too, because there’s more than one way to dig a hole.

Repost: I miss email.

I miss asynchronous conversation.

I miss the ability to have an actual thread of thought preserved in something less ephemeral than memory, or in some chat log somewhere on one of my systems’ hard drives.

I miss the ability to not be there if someone has an observation I’m interested in. I don’t want to have to observe in real time.

I miss email. If someone has something to say, is it that hard to write it in such a way that it can be understood clearly, with topics and explanations?

I say no.

I use Twitter, but not Facebook; my use of Twitter isn’t “normal,” I think, and it’s fairly inefficient.

I can make 140-character thoughtlines, I think, but they lack a core representation of my personality in them. While I recognize that the point is the message and not the messenger, often the messenger creates the message not as a set of words, but with the force of personality and intent.

The message is the thing. The messenger makes the message, and becomes part of it.

Twitter’s limitations on messages forces their very tight focus, which is a good thing – it’s an excellent training ground for learning how to focus what you say – but tight focus lacks conviction.

I miss the chance to see that conviction.

There’s social commentary here, too, even if I don’t know how to frame it well. Recently, I had an email exchange with someone, and he complained that I had taken too much time to explain my position on something, that I clearly wasn’t focused on my responsibilities if I had time to explain myself in detail.

I was horrified and amused – the dismissiveness was funny, really, but the intent behind it was not so good.

I still don’t know if what he meant was that my reasons were specious, or that he had no interest in reasonings. (It’s my personal feeling that convictions establish the meaning behind what people think; I can accept the silliest concepts from people who have reasons to hold them, even if I don’t agree with them.)

I miss email.

A lot.

Repost: Caches, an unpopular opinion, explained

I have an unpopular opinion: caches indicate a general failure, when used for live data. This statement tends to ruffle feathers, because caches are very common, accepted as a standard salve for a very common problem; I’ve been told that cloud-based vendors say that caches are the key to scaling well in the cloud.

They’re wrong and they’re right. In fact, they’re mostly right – as long as some crucial conditions are fulfilled. We’ll get to those conditions, but it’s important we get some important details clear.

Caches are fine for many situations.

In data, it’s important to think in terms of read/write ratios. Therefore, you have read-only data (like, oh, a list of the US States), read-mostly data (user profiles), read/write data (live order information), write-mostly and write-only data (audit transactions or event logging). Obviously someone might read audit logs some day, so the definitions aren’t purely strict, but we’re talking in the scope of a given application’s normal use, so it might be appropriate to think of audit logs as write-only, because a given application might not use them itself.

The read/write status is crucial for determining whether a cache is appropriate. It’s entirely appropriate to cache read-only data, given resources.

In addition, temporary data that’s reused can be cached. Think of, oh, a rendered stylesheet or a blog post: it changes rarely (unless they’re written by me, where they get constantly edited), is requested often (unless it’s written by me, in which case it has one reader), and the render phase is slow in comparison to the delivery of the content (because Markdown, while very fast, isn’t as fast as retrieving already-rendered content.)

Caches for Live Data

The use of cache on live data, where the data is read-mostly to write-only, is what I find distasteful. There are circumstances which justify their use, as I’ve already said, but in general the existence of a cache indicates an opportunity for improvement.

In my opinion, you should plan on spending zero lines of code on caching of live data. That said, let’s point out when that’s not true or possible.

In the typical architecture for caches, you have an application that, on read, checks the cache (a “side cache”) for a given key value; if the key exists in the cache, the cached data is used. If not, the database is read, and the cache is updated with the key value (for future use). When writes occur, the cache is updated at the same time the database is, so that for a given cache node the database and the cache are synchronized.

The database here is the “System of Record,” the authoritative data source for a given data element. The cache holds a copy of the data element, and shouldn’t be considered “correct” unless it agrees with the system of record.

You can probably see a quick issue (and one that’s addressed by many caching systems): distribution. If you have many client systems, you have many caches, and therefore many copies, each considered accurate as long as they agree with the system of record. If one system updates the system of record, the other cached copies are now wrong until they are synchronized.

Depending on the nature of the data, maintaining accurate copies could require polling the database even before the cached copy is used. Cache performance in this situation gets flushed down the tubes, because the only time a cache provides a real enhancement is if the data element is large enough that transfer to the application takes much longer than, well, it should. (A data item that’s 70 kb, for example, is probably going to take a few milliseconds to transfer – more than checking a timestamp would – and therefore you’d still see a benefit even while checking timestamps.)

Some caching systems (most of them, really) provide distributed cache, so that a write to one cache node is reflected in the other caches, too. This evades the whole “out of sync” issue, but introduces some other concerns… but this is something you should have before even considering caching.

If you’re going to use a cache, it should be distributed. You should look for a peer-to-peer distribution model, and transaction support; if your cache doesn’t have these two features, you should look for another cache. (Or you could just use GigaSpaces XAP, which does this and more; read on for an explanation.)

So what should you really do?

To me, the problem lies in the determination of the System of Record. A cached value is a copy; I don’t think it’s normally necessary to have that copy, and it’s actually fairly dangerous. So what’s the alternative?

Why not use a System of Record that’s as fast as a cache? If the data happens to be cached, you don’t care beyond reaping the benefits; your data architecture gets much simpler (no more code for caches). Your application speeds up (dramatically, actually, because data access time is in microseconds rather than milliseconds… or seconds). Your transactions collide less because the transaction times go down so much. Everybody wins.

The term for this kind of thing is “data grid.” Most data grids are termed “in-memory data grids,” which sounds scary for a System of Record, but there are easy ways to add reliability. Let’s work out how that happens for GigaSpaces, because that’s who I work for.

In a typical environment, you’d have a group of nodes participating in a given cluster. These nodes have one of three primary roles: primary, backup, and mirror (with the mirror being a single node.) A backup node is assigned to one and only one primary; a primary can have multiple backups. The primaries are peers (using a mesh-style network topology, communicating directly with each other.)

Let’s talk about reads, first, because writes want more examination than reads will require. A client application has a direct connection to the primaries; depending on the nature of the queries, requests for a given data element are either distributed across all primaries (much like a map/reduce operation) or routed directly to a calculated primary (i.e., the client knows where a given piece of data should be, and doesn’t spam the network).

Now, before we jump into writes, let’s consider my originating premise: stated too simply, caches are bad, because data retrieval is slow. In this situation, the reads themselves are very, very fast because you’re talking to an in-memory data grid, not a filesystem at all, but you still have to factor in network travel time, right?

No, not always. XAP is an application platform, not just a data grid. The cluster nodes we’ve been talking about can hold your application, not just your application’s data – you can co-locate your business logic (and presentation logic) right alongside your data. If you partition your data carefully, you might not talk to the network at all in the process of a transaction.

Co-located data and business logic comprise the ideal architecture; in this case, you have in-memory access speeds (just like a cache) with far more powerful query capabilities. And with that, we jump to data updates, because that’s the next logical consideration: what happens when we update our data grid that’s as fast as a cache? It’s in memory, so it’s dangerous, right?

No. Being in-memory doesn’t imply unreliability, because of the primary/backup/mirror roles, and synchronization between them. When an update is made to data in a primary, it immediately copies the updated data (and only the updated data) to its backups. This is normally a synchronous operation (so the write operation doesn’t conclude until the primary has replicated the update to its backups).

If a primary should crash (because someone unplugged a server, perhaps, or perhaps someone cut a network cable), a backup is immediately promoted, a new backup is allocated as a replacement for the promoted backup, and the process resumes.

The mirror is a sink; updates are replicated to it, too, but asynchronously. (If the mirror has an issue, mirror updates will queue up so when the mirror resumes function all of the mirror updates occur in sequence.)

In this configuration – again, this is typical – the database becomes fairly static storage. Nothing writes directly to it, because it’s secondary; the actual system of record is the data grid. The secondary storage is appropriate for read-only operations – reports, for example.

Does this mean that the data grid is a “cache with primacy”? No. The data grid in this configuration is not a cache; it’s where the data “lives,” the database is a copy of the data grid and not vice versa.

Does it mean we have to use a special API to get to our data? No. We understand that different data access APIs have different strengths, and different users have different requirements. As a result, we have many APIs to reach your data: a native API, multiple key/value APIs (a map abstraction as well as memcached), the Java Persistence API, JDBC, and more.

Does it mean we still might want a cache for read-only data? Again, no. The data grid is able to provide cache features for you; we actually refer to the above configuration as an “inline cache,” despite its role as a system of record (which makes it not a cache at all, in my opinion). But should you need to hold temporary or rendered data, it’s entirely appropriate to use the data grid as a pure cache, because you can tell the data grid what should and should not be mirrored or backed up.

The caching scenarios documentation on our wiki actually points out cases where a data grid can help you even if it’s not the system of record, too. For example, we’re a powerful side cache when you don’t have enough physical resources (in terms of RAM, for example) to hold your entire active data set.

Side Cache on Steroids

One of the things that I don’t like about side caches is that they typically infect your code. You know the drill: you write a service that queries the cache for a key, and if it isn’t found, you run to the database and load the cache.

This drives me crazy. It’s ugly, unreliable, and forces the cache in your face when it’s not really supposed to be there.

What XAP does is really neat: it can hold a subset of data in the data grid (just like a standard cache would), and it can also load data from secondary storage on demand for you with the external data source API. This means you get all the benefits of a side cache, with none of the code; you write your code as if XAP were the system of record even if it’s not.

So do I REALLY hate caches?

If the question is whether I truly dislike caches or not, the answer would have to be a qualified “no.” Caches really are acceptable in a wide range of circumstances. What I don’t like is the use of caches in a way that exposes their weaknesses and flaws to programmers, forcing the programmers to understand the nuances of the tools they use.

XAP provides features out of the box that protect programmers from having to compensate for the cache. As a result, it provides an ideal environment in which caches are available in multiple ways, configurable by the user to fit specific requirements and service levels.

A final note

You know, I say things like “caches are evil,” because it attracts attention and helps drive discussion in which I can qualify the statement. As I said above, I don’t actually think that – and there are lots of situations in which people have to adapt to local conditions regardless of the “right way” to do things.

Plus, being honest, there’s more than one way to skin a cat, so to speak. My “right way” is very strong, but your “right way” works, too.

And that’s the bottom line, really. Does it work? If yes, regardless of whether the solution is a bloody hack or not, then the solution is right by definition. Pragmatism wins. What I’ve been discussing is a shorter path to a solution that people run into over and over again, when they really shouldn’t. I think it’s the shortest path. (I’ve found no shorter, and I have looked.) It’s not the only path.