How can I automatically expire data from a database?
if you're forced to use a relational database (often the case when you work for a "real company" that doesn't mind violating Java EE specifications at will but is afraid of anything not involving Oracle on the back end), you're going to have to do something like a cron job to scan for stale data, deleting on discovery. (Think of a "last_accessed" column on your tables.) You can do this with cron, or Quartz, or an EJB Timer bean, or a java.util.Timer, or whatever your environment supports.
This is icky. It works. It's still icky. Avon loves this idea, because there are a lot of pigs out there that need a lot of lipstick.
The best way to do this, however, is to not use a permanent datastore for transient data. You could use something that provides distributed access to a cache, like Coherence (a commercial product from Oracle that's actually very successful), OSCache (an open-source cache solution), or other similar cache products...
Or you could use a datastore that's been designed with this kind of capability in mind: JavaSpaces.
JavaSpaces is an API that provides a distributed, shared memory space. You can think of it like a cache, if you like, but it's really closer to a datastore that doesn't force you to think in terms of permanent storage unless you decide to.
It has four basic operations:
- Read - which is a nondestructive retrieval of an object from the space
- Write - a write into the space that can have a lease time, which means it expires at the end of the lease
- Take - a destructive read from the space
- Notify - get events from the space
With these four operations, you can actually do a heck of a lot. That said, it's still sometimes odd to work with something that looks like a cache and has the capabilities of doing more.
Let's be frank here: I work for a company called GigaSpaces Technologies, who makes something called the eXtreme Application Platform (XAP), largely based on space-based architecture. I was working for a large media conglomerate, and was so impressed by the technology that I went to work for them instead. I'm biased, okay? But my bias is based on objective observation.
With XAP, you get a lot of features that bare JavaSpaces don't get you: easy access to the space, a GUI administration tool, the Spring programming model, custom processes embedded into the cluster itself, colocation of data and processes, automagical mirroring of the space into a relational database should you desire it... it's great stuff.
So what I would do, given the task of creating expiring data, is use the space to hold all data - transient and otherwise - and have a backend process running in the space that writes non-transient data into a relational database for warehousing and reporting.
I don't mean this as a sales pitch for XAP. I'm obviously a believer, but the main point here is to say that relational databases shouldn't be all there is to data storage - and you can and should think out of the boxes Java EE gives you to solve problems.
Resources:
- My data structures have changed - how can I migrate my data model?
- The Viet Nam of Computer Science, Ted Neward's excellent essay on why object/relational mapping is a bad thing