Toilet


Toilet started out as a "systems approach to a database," originally conceived from the top down: that is, we wanted something "better" than SQL. After a while of working on that, it became clear that we needed to start from the bottom up, and see where the techniques we wanted to use for the lower layers would lead us.

So, what techniques did we want to use for the lower layers of toilet? First and foremost, logging. Absolutely every change to any kind of data in toilet results in some data being written to an append-only log. We want to do things this way because disks are actually surprisingly fast these days at writing consecutive data, but still pretty slow at seeking. So, we want to be able to write all updates to a consecutive log.

Doing this, of course, leaves us with an arrangement on the disk which is hardly optimized for reading. But that's OK: normally, we never actually read the log anyway. Everything we write to the log we also store in RAM, and periodically we write the data from RAM out to disk in a read-optimized format. In fact, the data written here is not only read-optimized but read-only once written. In keeping with our desire to avoid seeks, we write this data out consecutively as well. Once we've written this data, we can free all the RAM we were using to store it and append an entry to the log indicating that the data has been "digested" and can be removed from the log in the future.

So clearly we're going to end up with a bunch of these read-only data stores, since we create a new one every time we digest the log. (How frequently we digest the log depends on the use pattern, so it's customizable.) Later stores may contain updated versions of data in earlier ones, or even special negative entries that mark data items in earlier stores as having been removed. While it's OK to have a few of these around at a time, we don't want them to build up too much, since they introduce overhead while reading and can waste disk space storing obsolete data.

The solution here is very similar to the one we apply to the log data: periodically, we generate a new read-only store from the composite data of the other stores, and delete the source read-only stores once it has been written. We don't have to combine all existing read-only stores, of course - we may only want to combine several recent ones into one larger one, and then perhaps later combine that with others of similar size to form an ever larger one. Toilet makes this very flexible, as it must be to accomodate a variety of different use patterns.

More text will go here, but the text above needs some pictures and diagrams to make it less dense before more is added.


Oh, why is it called "toilet" you ask? Yeah... well... let's just say there are a lot of great puns here, but the best of them is that when we're done with this project, we can write the "toilet paper!"