Toilet
Toilet started out as a "systems approach to a database," originally conceived
from the top down: that is, we wanted something ""
than SQL. After a while of working on that, it became clear that we needed to
start from the bottom up, and see where the techniques we wanted to use for the
lower layers would lead us.
So, what techniques did we want to use for the lower layers of toilet? First and
foremost, logging. Absolutely every change to any kind of data in toilet results
in some data being written to an append-only .
We want to do things this way because disks are actually surprisingly fast these
days at writing consecutive data, but still pretty slow at seeking. So, we want
to be able to write all updates to a consecutive log.
Doing this, of course, leaves us with an arrangement on the disk which is hardly
optimized for reading. But that's OK: normally, we never actually read the log
anyway. Everything we write to the log we also store in RAM, and periodically we
write the data from RAM out to disk in a read-optimized .
In fact, the data written here is not only read-optimized but read-only once
written. In keeping with our desire to avoid seeks, we write this data out
consecutively as well. Once we've written this data, we can free all the RAM we
were using to store it and append an entry to the log indicating that the data
has been "digested" and can be removed from the log in the future.
So clearly we're going to end up with a bunch of these read-only data stores,
since we create a new one every time we digest the log. (How frequently we
digest the log depends on the use pattern, so it's customizable.) Later stores
may contain updated versions of data in earlier ones, or even special negative
entries that mark data items in earlier stores as having been .
While it's OK to have a few of these around at a time, we don't want them to
build up too much, since they introduce overhead while reading and can waste
disk space storing obsolete data.
The solution here is very similar to the one we apply to the log data:
periodically, we generate a new read-only store from the composite data of the
other stores, and delete the source read-only stores once it has been written.
We don't have to combine all existing read-only stores, of course - we may only
want to combine several recent ones into one larger one, and then perhaps later
combine that with others of similar size to form an ever larger one. Toilet
makes this very flexible, as it must be to accomodate a variety of different use
patterns.
More text will go here, but the text above needs some pictures and diagrams to
make it less dense before more is added.
Oh, why is it called "toilet" you ask? Yeah... well... let's just say there are
a lot of great puns here, but the best of them is that when we're done with this
project, we can write the "toilet paper!"