Atomic operations over filesystem and database
I had the situation recently where I needed a "button"; when the user clicks the button, then:
- An "svn up" should occur, to update some XML config files + data files (CSV)
- If the config files or CSV files have errors, the operation should be aborted and an error printed
- If correct, the CSV data should be imported into the database ready for processing
- If correct, the config files should be made available for use
The config files are a bunch of XML files; each file has a complex structure, and they themselves are in a complex directory structure. One option would be to put them into a database as well but I felt that a simpler solution was just to leave them on the disk where they could be parsed. Directories, files and XML files more naturally fit the hierarchical nature of the configuration in question. I didn't want to have multiple representations of this configuration (one being directories/XML; the other being a database).
This is all OK, but:
That means:
- If anything goes wrong, the entire operation fails and rolls back with an error message
- If it succeeds then the entire operation succeeds
- If the server crashes, then an update was either successful or rolled back (but nothing in between)
I came up with a nice solution:
- A database transaction is started
- The "svn up" process programmatically connects to the Subversion server and does a brand new checkout into a directory with a random name (i.e. actually "svn co" not "svn up").
- The XML files are parsed and checked; if there is an error, the operation is aborted
- The CSV files are loaded into the database within the transaction, if there is an error, the operation is aborted (& rolled back)
- The new random directory name is written into a table "current directory name", also in the transaction
- The transaction is committed
- Each time access is needed to the XML files, a lookup is first done in the table to retrieve the directory name, the files are then read from this directory.
This has the consequence that:
- Either the "commit" occurs and the new XML files are "live" (table points to new directory name) and the CSV data has been imported
- Or a "rollback" (or crash and auto-rollback) occurs in which case the CSV import is rolled back, and the table points to the old version of the XML files which are still on the disk
I'm pleased with this solution :-)
One should really also write a clean-up script, deleting directories which (a) aren't in the database and (b) were created a certain amount of time ago e.g. a few days. That way old directories, or directories checked out which had errors or failed, are removed, but directories which are in the process of being checked out are not removed.
P.S. Allowing the user to submit new versions of the data by checking it into Subversion is a nice idea:
- It's a bunch of structured data (bunch of XML and CSV files in a particular directory format)
- This is easy to visualize with Windows Explorer and TortoiseSVN
- This would not be easy or nice to do with multiple <input type=file> file upload buttons
- The "svn co" command in the software is easy; there are various libraries for programmatic access to SVN: I used svnkit: I found it well documented, seemingly bug free, it just worked first time.
- Obligatory link about why I didn't use Git ;-) (the person involved can use TortoiseSVN but wouldn't have been able to handle git)
Update: User-interface for this algorithm described here:
/offer-ready-publisher/