Thoughts dump, for Jul-06-2010
Our large-scale, high-performance and highly available ( those were the goals anyway, I hope we attained them ) data store has been more or less ready for production for a few weeks now.
We have yet to actually deploy it, although two forthcoming (in-development) projects will be built on top of it. There are a few things here and there we could, and will, change, cleanup the client library API and all that, but as far as I can tell, there are no real issues left to resolve. During testing, we got unto 40K GET(value by key) and over 50K PUT(value by key) operations/second on a 3 nodes system (quorum arrangement). Adding nodes increased capacity and throughput which was one of the design goals.
We got a few more similar projects in the pipeline; more building blocks for our services stack. We are going to build two different file system (one will be optimized for very high performance access to files, another for availability and storage of files not limited in size), a MapReduce framework/infrastructure and a new distributed lock manager which will also replace ad-hoc solutions we currently rely on.
I am very proud of our team; they are smart and inventive, passionate and hard working. They let me toy around with ideas that do not always make sense and they always find ways to make me feel great about what we do. Good times ahead.
Thank you for your kind suggestions. We do not really publish papers or pursue any such avenues; however, in regards to CloudDS, it really is based on publications by Google, Amazon, Yahoo and others ( BigTable, Dynamo, PNUTS), at least the main ideas stem from said papers.
Christos Trochalakis 7/7/2010 11:15
Very interesting! Congrats.
We 'd love to read a technical overview if you guys have time.
Thank you Christos. If you have any questions, please email me/us ( I doubt we 'll publish anything anytime soon ).
adamo 6/7/2010 21:51
Cool! Maybe we'll see a paper describing it submitted to a Greek conference (like HDMS maybe) someday?