Below is a video showing off the fault tolerance and recovery properties of FoundationDB under some nasty conditions (single & multiple machine failures, network partition, full cluster power loss) on a live demo cluster we bring with us to events:
The Demo Cluster Backstory:
Back in August of last year, we were a few weeks away from flying out to San Francisco for our first sponsored event - a table at TechCrunch Disrupt. We had a database that was unlike anything that existed - we had all of the NoSQL buzzwords covered (scalable / distributed, elastic, fault tolerant, shared-nothing…), but we also had high performance ACID transactions that could span all data in a cluster. We were proud of our creation, but perplexed - how could we make it real to people? How could we demo a database? Let people set key “foo” to “bar” and then get “foo” and see that it worked? Not exactly thrilling stuff.
We decided instead for something more hands-on - we would let people try to break it. We set up a portable cluster of five machines that each ran both a FoundationDB server (with data replication set to 3 copies) and a synchronized status display application, and then we put up a sign that said “Break our Database”. During the conference, we let people test FoundationDB’s fault tolerance by unplugging anything they wanted.
"Breaking" the database could have looked like:
Powering down or disconnecting up to two out of the five machines, and having the database become unavailable to the remaining machines.
Having any disconnected or powered down machine fail to re-join the cluster after power / connection was restored.
Powering down or disconnecting three or more machines and having the remaining machine(s) stay available, despite the fact that they did not have a quorum.
Creating a network partition (we brought three network switches to allow this) and having either both sides stop working, or both sides keep working (only the side with the majority should remain up).
After a total power failure, having the cluster fail to pick up right where it left off as soon as three of the machines had booted back up.
Any violation of ACID.
We were confident that due to the hell we’d put the database through in our internal testing environment, FoundationDB would remain undefeated, even after three days of non-stop random power and connection failures. What we weren’t sure of was whether people would think this was very interesting.
Well, it turned out that people really liked it - a lot. Many people commented that we had the second coolest demo there - with the coolest being a gyro stabilized two wheeled car (tough competition for a database!). The interest generated from the demo got us featured at the top of TechCrunch at the end of the first day of Disrupt, and that story drove more people to chat with us and reach out during the conference. Over a few days our alpha program participants more than doubled, inbound interest really picked up, and we were officially on the NoSQL map.
It was a good thing for us, and now we take the demo cluster to most meetups or conferences we attend, and it’s always a hit. Lots of database companies claim fault tolerance, but we’ve never seen one that let people unplug machines with the world looking on. In all of our time running the demo at various conferences, FoundationDB has never been “broken”. If you want to give it a shot for yourself, come say hi to us at an event!