Simple is Beautiful | Technology, Programming, Video Games
This blog is about technology, programming, video games, books and other related topics. It is published by Mark Papadakis.

Update on CloudDS

Here is a progress update to my current main project (we call it 'CloudDS' which stands for cloud data store which is a silly name but it will have to do until we can find a replacement ).
I have been working on the data store component of the service. It has taken at least x4 as much time and effort as I thought it would. A prime reason for underestimating the time requirements is that the initial features list I wanted to implement doubled in size. In addition to that, testing for most of the possible logic paths that could result to failure also took a long time - even if some of that testing was automated, not all of it was and validating results is harder than setting up the test environment.

In such a service, it matters little if most of underlying components fail (I/O and tasks scheduler, garbage collector, cache subsystem, etc) as long as the data management component is not affected. Suffering from a service outage is bad, suffering data corruption and/or data loss is something that has to be prevented by any means necessary.

As it stands, that said component now deals fine with reads and writes, self-healing, caching and performs faster than I hoped it would. The data model is based on BigTable, Dynamo, Cassandra and some earlier prototypes/projects we toyed with in the past. It borrows Cassandra's ColumnFamily/SuperColumn/Column key value representation model. Data are pushed into MemTables and an append only commit log, memtables are flushed into SSTables to disk.

The GCollector merges SSTables whenever required to reclaim space, resolve conflicts and extract a single value out of multiple versions, etc. All operations supported by Cassandra are implemented (query by path, predicate, column names, key ranges, etc ) and CloudDS clients/users will also be able to use a scripting language to describe explicitly down to bytes what they need(i.e give me the first couple of bytes for those values, or gimme a concatenation of those values, etc etc).

Now that that component is out of the way, I can move on to the rest; those are relatively straight forward to implement ( the tasks scheduler and the network I/O subsystems are mostly done ).

Tagged with: databases , programming , phaistos networks
Published at Friday, 26 March 2010 9:19 pm

Post as anonymous

HTML tags are not allowed for comments on this blog


« mySQL, noSQL, and Key Value datastores iPhone CSRs, Digital Certificates, Encryption and Cryptography »
Powered by Pathfinder Blogs