“Just-in-time” inserting rows into a database
If you're writing software using a database system such as MySQL, your code doesn't have access to all the data at any point in time, you SELECT just the data you need to operate on.
If you want to do some database-wide operation, you need to ask the database to do it, because you don't have all the rows.
When you want to enforce uniqueness, for example across a whole table (for example a document name needs to be unique), or across a particular part of a table (for example each user must have documents with unique names), you need to ask the database to do it, because, again, you don't have all the rows.
There is only one acceptable way to do this with a SQL database. It happens when you write a new row into a table where a uniqueness constraint must be enforced:
- Insert a new row
- If the insert succeeds, there wasn't a row there before, and now there is
- If the insert throws a unique constraint violation, the row is already there
- If you want to update the row (i.e. an "insert or update" operation to maintain a "lazy singleton" in the database), you can update the row with safety after the unique constraint violation, as you can be certain the row is already there.
The following methods are all not acceptable:
- Do a "select" to find out how many rows there are. If there aren't any, do an "insert". However someone may have inserted a row between your "select" and your "insert".
- Do an "update" and if the database says that 0 rows have been updated, do an "insert". Again, someone may have inserted a row between your "update" and the "insert".
- Do a "select for update" statement (Oracle, Postgres, InnoDB) to check that there aren't any rows while creating a lock, and then do an insert. However that statement only locks the rows it returns, so if it doesn't return any rows, it doesn't create any locks, so you still can't be certain that no one has inserted a row between the "select" and the "insert".
- Lock the whole table and do one of the above. This works, but it means that all write access is "serialized" i.e. happens after one another. Any other operation, writing something completely irrelevant, will now also have to wait until the end of your transaction, whereas it shouldn't have. This reduces concurrency.
The way I program this is the following. On the "insert" statement, I catch the database error, and see if it's a "unique constraint violation"-type error. If it is, I throw an (unchecked) Exception. The calling code can catch that and do something with it (or not, if the statement should not generate such an error, in which case it will propogate to the main loop like any other database exception). I have had the pleasure of introducing this to easyname.eu and now also the pleasure of introducing this to the WebTek framework.
It is extremely frustrating working with big complex frameworks, whose usage is much more complex than just writing SQL manually, and not being able to do the above properly.
- Hibernate clearly states in its documentation that if any database error occurs, the Session (main object managing all persistence) must be destroyed as its state will be out-of-sync with the database. But there is no way other than the above to do this sort of check (as far as I know).
- OptimalJ generates code that, if a database error occurs, sets the transaction to rollback. Including any other work you may have done.
I mean checking unique constraints is something every database application needs, so the fact it's not supported by major frameworks is just unbelievable.