We’ve been talking a lot about transactions lately, and I wanted to amplify a few of the key points of the conversation.
The real usefulness of transactions comes from their ability to coordinate and synchronize a variety of data elements when multiple clients perform updates concurrently. For transactions to play this role, the developer must have the ability to specify whatever data elements are logically needed. A system may implement more limited operations, like Compare And Swap (CAS), with one or more of the ACID properties, but these operations aren’t transactions.
Transactions allow you to define a set of database reads and writes that are handled as a unit with ACID guarantees. All reads in the transaction see the same snapshot of the database (i.e., they don’t see changes resulting from other transactions executing concurrently). The writes within a transaction either all succeed or all fail. (Failure might be caused, for example, by a connection loss.) After a transaction commits, the writes are permanently stored.
When combined with a high-performance storage substrate, transactions give you the flexibility to design and modify your own data model. This flexibility becomes much more than a convenience when your application begins to evolve with changing requirements. No doubt, you can sometimes build an application without transactions by designing your data schema to support the specific access patterns you need today. However, when your requirements evolve, transactions can make the difference between an easy change in a modeling layer and a major overhaul of your architecture.
ACID transactions can be composed to create new abstractions, support more efficient data structures, and enforce integrity constraints. They become major building blocks for correct design, making it easier to get your code right. Applications built with them are more reliable and extensible. Operations limited to a single row, document, or adjacent graph elements may be useful tools, but they don’t serve this basic engineering purpose.
Transactions allow you to define and preserve strong data guarantees at each level of your application. Going without transactions has negative consequences for any application that depends on the correctness of data undergoing concurrent updates. From a developer’s perspective, transactions are the simplest and strongest programming model available for concurrency control. They are a fundamental tool for engineering distributed systems, and they promise to play an increasingly prominent role as NoSQL technology matures. If you’re building a data-centric application that needs to scale, and you don’t have transactions, you’re incurring a technical debt that may be difficult to repay.
Cassandra 2.0 was released a few days ago, and its headline feature is a compare-and-swap operation. I’m happy to see Cassandra added CAS, which is a very useful capability — that makes the world a little better. And the database world needs all the “better” it can get. I’m less happy to see the feature referred to as “lightweight transactions”, because that is misleading and makes the world a little more confusing. And the database world does not need any more confusion.
Compare-and-swap is not a general transaction facility. At bottom, transactions provide the ability for an application to do multiple operations “together” without interference from other concurrent access to the database. Compare and swap provides a special case of this: the ability to do two operations “together”, one of which is a read from a row (or “partition” in Cassandra), and the other a write to the same row. Being a special case of a transaction is not very exciting in and of itself, because transactions are so general: just about anything is a special case of a transaction.
Transactions are useful for a huge range of purposes. Maintaining and validating invariants, things that are always true of the data in the database, like a friend of a user always having that user as a friend, a product that’s present in an order always being present in the collection of products, or a user having a unique username and e-mail address. Maintaining useful data structures like scalable and consistent indexes. Implementing one data model in terms of another. Moving a record from one collection to another. Of the above examples, uniqueness checks are the only one that’s obviously implementable with compare-and-swap alone, and even those enjoy caveats and complications if there is more than one uniqueness constraint on the same object.
Implement the following function correctly and scalably with Cassandra’s CAS operations. Don’t forget that the client can fail…
Users = directory.create_or_open(db, "users")
Emails = directory.create_or_open(db, "user_emails")
def addUser( tr, username, email, name ):
if tr[Users[username]] != fdb.tuple.pack((email,name)):
if tr[Users[username]] != None:
raise Exception("Username already in use")
raise Exception("Email already in use")
tr[Users[username]] = fdb.tuple.pack((email,name))
tr[Emails[email]] = username
Transactions also make your application correct by default under concurrency: if your application’s transactions work right in any sequential order, then with a serializable database they also work under any concurrent execution. Some applications may need to relax serializable consistency for performance in specific cases, but it’s much better to do that, and deal with the much more challenging requirements of getting concurrency right with even slightly weaker guarantees, only where you have actual predicted or measured performance problems. Compare-and-swap doesn’t provide that same peace of mind, even when it is implemented efficiently enough that you can use it ubiquitously (as Cassandra does not currently do). Think about how much harder it is to write concurrent shared-memory software using only CAS-like features (http://en.wikipedia.org/wiki/Non-blocking_algorithm) than to write it using synchronization primitives (or even better, with transactional memory, if you are lucky enough to use an environment that supports it).
None of this says that CAS isn’t a very useful database feature, especially in the context of Cassandra’s data model and update resolution semantics! But I’m concerned, and I’m afraid Datastax is hoping, that many less sophisticated users will say “oh, ok, it has transactions”and not realize how much they are giving up relative to a truly transactional database until they actually get burned. Maybe this is a lost cause - there are other vendors billing exactly the same CAS feature as “ACID transactions” - but I think we should use the term “transaction” only for the real thing.
It’s also worth talking about another issue: usually the effective level of fault tolerance that your application enjoys is pretty much the worst level of fault tolerance of any of the things it does. So an application built on top of Cassandra can theoretically be fully available in a minority partition — as long as it doesn’t need to do anything that uses quorum consistency or more. If your application uses CAS for any common or important operation, it has effectively chosen CAP Consistency over CAP Availability and will not be available in minority partitions. You are getting, potentially, the worst of both worlds - most of your application (and the users thereof) has to deal with all the concurrency and consistency problems of an eventually consistent database, and no better availability than you could get with a fully transactional database. It’s much better, in my opinion, to start with a tool that gives you strong guarantees by default, and then relax them selectively where there is a clear performance benefit and you have done the careful analysis needed to prevent bugs.
In fact, the fault tolerance of Paxos replication (the implementation of Cassandra’s CAS feature) is not the best that you can do in practice. (This criticism applies to other systems using Paxos replication for database contents, including those like Google Spanner that actually provide transactions.) Paxos maintains consistency during partitions by requiring a majority of replicas to be available, which means that a Paxos replica set with N replicas can only handle (N-1)/2 failed or partitioned replicas. So for example in a three datacenter setup with triple replication, the loss of one datacenter and one machine in another datacenter is likely to make the database unavailable. FoundationDB uses a Paxos variant for coordination, but only stores a negligible amount of metadata using Paxos replication, and it is cheap to have more replicas of this tiny coordination state to handle more failures. Actual database contents are replicated using a variation of “good old” synchronous replication, and so can remain available with N-1 simultaneous failures of N replicas. That’s twice as many failures, and it is much faster, too.
(Disclaimer: I’m not an expert on Cassandra or its implementation, and so it’s possible I’ve said something wrong about it. If so, please let me know.)
At the NoSQL Now conference in San Jose, I presented the case for why I believe future generations of NoSQL databases will need to support ACID transactions in order for developers to more easily build, deploy and scale applications.
I also noticed that customers are starting to ask about ACID transactions, and that more NoSQL vendors are talking about the transactional or ACID capabilities of their products.
Unfortunately, as a rule, the transactional capabilities that NoSQL vendors tout apply only to a degenerate type of transaction involving only a single data element (such as a document or a row). This degenerate case is sometimes known as a compare-and-set (CAS) operation. In fact, Cassandra 2.0 was announced just yesterday and is receiving a good amount of attention, as it now supports “lightweight” transactions via CAS. These “transactions” aren’t really comparable to the ACID transactions provided by relational databases and systems like Spanner and FoundationDB.
In contrast, FoundationDB supports true high-performance global ACID transactions involving multiple data elements spanning nodes. (The other exception is MarkLogic, which can support global transactions by falling back to a traditional low-performance locking and two-phase-commit strategy.)
It’s hard to overstate how much more useful global transactions are than CAS operations. In my presentation, I explained that when true global ACID transactions and NoSQL come together, it opens up a huge range of possibilities. Transactions allow developers to reason locally rather than globally about their code and to easily build robust applications. With transactions, developers can easily build strong abstractions and multiple data models on top of the underlying data model.
I also spoke about the history of NoSQL, common misunderstandings of the CAP theorem, approaches to and challenges in trying to build and test an ACID database, and the details behind improving performance and testability by developing our own programming language, Flow, which adds actor model concurrency to C++.
You can check out my full presentation via Slideshare or the watch the recording below:
The Google F1 database paper just became generally available. It’s the database that runs AdWords, so I suspect Google kind of cares about it. Google F1 bears an amazing resemblance to our FoundationDB SQL layer. Both leverage an underlying scalable and transactional storage substrate to allow unprecedented scalability at both the storage and SQL processing levels. Both leverage hierarchical schemas, arranging tables into a hierarchy that is mirrored in storage by interleaving rows of parents with rows of children. Both systems make the tradeoff that higher latency is acceptable for gaining almost unlimited bandwidth and scale. A query may require more round-trips and take a (bounded) longer time, but the overall throughput of the system, measured in number of queries that can run in parallel, is much higher.
The fact that Google has built and is using F1 for their AdWords business demonstrates how the architectural approach works for even the most demanding of applications. However, F1 is only available to Google’s internal teams, while our open-source FoundationDB SQL Layer is intended to have a somewhat broader audience.
Below are seven of the strongest similarities; All quotes come directly from the above mentioned Google F1 paper. Read on and make up your own mind.
1. Built on a strong storage substrate:
Both systems rely on, and inherit from a (i) Transactional, (ii) Scalable, (iii) Highly-Available, and (iv) Ordered storage substrate, FoundationDB in our case, Spanner in the Google case.
“F1 is built on top of Spanner , which provides extremely scalable data storage, synchronous replication, and strong consistency and ordering properties. F1 inherits those features from Spanner and adds several more… Spanner handles lower-level storage issues like persistence, caching, replication, fault tolerance, data sharding and movement, location lookups, and transactions.”
2. Stateless SQL:
Because all state is in the consistent storage substrate, both SQL engines retain no state of their own. As a result SQL nodes can be easily added or removed, scaling overall SQL capacity. This allows very strong scalability and load balancing.
“F1 servers are mostly stateless, allowing a client to communicate with a different F1 server for each request.”
3. Insisting on truly transactional systems:
Both systems are strongly consistent and support ACID properties even for complex multi-object transactions. A lot has been written about why transactions are important. To anyone who is still confused I’d simply suggest an exercise of trying to write code to maintain an index, in a system that is not strongly consistent, while many users are changing the data.
“We also have a lot of experience with eventual consistency systems at Google. In all such systems, we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date. We think this is an unacceptable burden to place on developers and that consistency problems should be solved at the database level. Full transactional consistency is one of the most important properties of F1.”
4. Limited transaction duration:
In order to avoid the need to retain full logging history, both systems limit transaction duration to 5-10 seconds. In FoundationDB this may be increased over time but the intent is to explicitly disallow transactions that change data to run for long periods of time. Like Google we believe that this limit should suffice for most application types.
“Spanner provides a global safe timestamp, below which no in-flight or future transaction can possibly commit. The global safe timestamp typically lags current time by 5-10 seconds.”
5. Hierarchical schema - a relational schema with an explicit table hierarchy:
Both systems rely on clustering related rows from different tables. In the FoundationDB SQL Layer we call the concept Table-Groups. In essence, Table-Groups define and store nested data. Table-Groups organize a set of user-defined tables into a hierarchy that reflect an object structure, e.g. a CUSTOMERS table will be the parent of ORDERS table, which will be the parent of the ITEMS table. This Table-Group definition is then reflected in the physical organization of data. Related rows from the tables are stored in a hierarchy, interleaved according to the Table-Group structure, e.g. a CUSTOMER row is stored followed by its first ORDER row, followed by that order’s ITEMS, followed by the next ORDER and ITEMS for that customer, thereby reflecting the hierarchy of the tables.
“At the logical level, F1 has a relational schema similar to that of a traditional RDBMS… Logically, tables in the F1 schema can be organized into a hierarchy.”“Physically, F1 stores each child table clustered with and interleaved within the rows from its parent table. Tables from the logical schema cannot be arbitrarily interleaved: the child table must have a foreign key to its parent table as a prefix of its primary key… All of these properties of a hierarchical schema help mitigate the latency effects of having remote data… This clustering was critical to achieving acceptable latency.”
Taking advantage of this hierarchical row storage requires operators that are aware of this collocation…
“This data model allows us to efficiently join a parent table and a descendant table by their shared primary key prefix…F1 uses a merge-join-like algorithm which we call cluster join. The cluster join operator only needs to buffer one row from each table, and returns the joined results in a streaming fashion as the Spanner input data is received.”
Our SQL Layer takes this much further. Many more operators become ‘Table-Group’ or ‘Clustering’ aware. Our GroupScan operator is analogous to cluster-join. Other operators can leverage clustering information in indexes too, for example, the Index Intersection that can intersect indexes from any number of tables in a single Table-Group.
6. NoSQL and SQL interfaces live in harmony:
This is a direct result of the hierarchical schema that keeps an object in storage as a series of clustered rows. It is straightforward then to provide other interfaces to this data. The Google paper does not go into much detail on their API. The FoundationDB SQL Layer provides a RESTful API to the data as JSON objects in a fully consistent manner.
“F1 supports a NoSQL key/value based interface that allows for fast and simple programmatic access to rows. Read requests can include any set of tables, requesting specific columns and key ranges for each. Write requests specify inserts, updates, and deletes by primary key, with any new column values, for any set of tables. This interface is used by the ORM layer under the hood.”
7. Batch pipelining query execution for parallel execution and latency hiding:
Traditional relational databases operate on local data and optimize for disk access which was traditionally the high contention point. As a result, traditional execution engines approach query processing as a one row at a time affair. This is true for virtually all major RDBMSs today. A very high-bandwidth, over-the-network backend changes the picture completely. Each request may take longer but a very large number of requests can be serviced in parallel. Both F1 and FoundationDB SQL Layer employ similar techniques to batch many parallel requests to the backend amortizing latency and driving performance.
“…remote data accesses involve highly variable network latency. In contrast, traditional database systems generally perform processing on the same machine that hosts their data, and they mostly optimize to reduce the number of disk seeks and disk accesses. Network latency and disk latency are fundamentally different in two ways. First, network latency can be mitigated by batching or pipelining data accesses. F1 uses extensive batching to mitigate network latency. Secondly, disk latency is generally caused by contention for a single limited resource, the actual disk hardware. This severely limits the usefulness of sending out multiple data accesses at the same time. In contrast, F1’s network based storage is typically distributed over many disks, … so scheduling multiple data accesses in parallel often results in near-linear speedup until the underlying storage system is truly overloaded.”
So what are we really saying? That we’ve got an F1/Spanner like combination for the masses?
After a successful 18-month Alpha and Beta testing program involving more than 2,000 participants, we’re very excited to announce that we’ve released version 1.0 of FoundationDB and general availability pricing!
Built on a distributed shared-nothing architecture, FoundationDB is a unique database technology that combines the time-proven power of ACID transactions with the scalability, fault tolerance, and operational elegance of distributed NoSQL databases.
You can download FoundationDB and use it under our Community License today and run as many server processes as you’d like to in non-production use, and use up to six processes in production for free! You don’t even have to sign up – just go to our download page for instant access. You’ll get all the technical goodness of FoundationDB - exceptional fault tolerance, high performance distributed ACID transactions, and access to our growing catalog of open source layers – regardless of whether you’re a community user or a paying customer.
Have a big application that needs more than six processes in production, or want your FoundationDB cluster supported? We’re also offering commercial licensing and support priced starting at $99 per server process per month. Check out our commercial license and support plans on our pricing page.
Our goal is to be the easiest database company to do business with, and we believe our licensing model and transparent pricing reflects just that. We’re completely redefining the price and performance of NoSQL databases while simultaneously delivering features like high performance distributed ACID transactions that our competitors can’t match.
Throughout our Beta program, we’ve received extremely encouraging feedback from our customers and users, and we believe we’re on the path to re-defining data storage for modern distributed systems. We’re thankful for all of you that have been with us this far, and can’t wait to see all the new applications FoundationDB will power as we move ahead!
At FoundationDB, we believe passionately in the engineering benefits of transactions. We put together this video to provide a quick overview of these benefits and explain how transactions can help you build a solid, reliable application. Here are some of the things the video covers:
What’s a transaction?
A transaction is a set of database reads and writes with some crucial properties: reads are not affected by writes from other transactions (Isolation), writes all succeed or fail together (Atomicity), and writes are permanently stored (Durability). Each of these is one of the ACID guarantees.
Which applications need transactions?
Any application with multiple clients operating concurrently needs transactions. Transactions aren’t just for financial applications, like e-commerce or banking. They’re fundamentally an engineering tool for concurrency control, one that’s better for application-building than any of the alternatives.
How do transactions simplify concurrency?
Isolation makes concurrency easy for developers. Serializability, the strong form of Isolation supported by FoundationDB, lets you treat each transaction as if it were sequential, which greatly simplifies reasoning about interactions among them.
What’s the difference between local and global transactions?
Some NoSQL systems advertise support for ACID transactions, but there’s usually a catch. Often, they support transactions for a limited set of predefined operations restricted by data locality (e.g., a single document or local graph components). The real power of transactions comes when they’re global, meaning they can act on any set of data elements chosen by the developer.
How do transactions enable abstraction?
Global transactions let you build new functionality over your data and expose it through an API. A simple example is an index. At one level, an index is just another copy of your data with a new access path. What makes indexing hard is the need to keep the index synchronized with your data as clients perform concurrent updates. Transactions make this synchronization easy.
Aren’t transactions computationally expensive?
Contrary to what is often assumed, transactions don’t limit scalability or performance in distributed systems, and they don’t represent a fundamental tradeoff in the design space. However, what is true that transactions require a great deal of engineering effort to implement well in a distributed environment. FoundationDB has put in that effort. We’ve done extensive benchmarking and have found that our transactional processing accounts for less than 10% of total system CPU load. Guaranteeing durability does increase write latency, but durability is a key feature for fault tolerance and is usually worth the cost.
How will transactions impact the NoSQL movement?
Many applications employ concurrency with multiple clients. Without transactions, the burden of concurrency control falls on developers, and this burden becomes less manageable as applications scale out. As the NoSQL field matures and is deployed more widely, transactions will become a foundational element of distributed database design.
Interested in FoundationDB’s distributed ACID transactions?
Today we are excited to announce our acquisition of Akiban Technologies. This acquisition means that we will be able to provide true SQL capabilities on top of FoundationDB much earlier than we anticipated. When we were introduced to the Akiban team and technology it was easy to see the potential of a partnership—FoundationDB and Akiban have been pursuing different parts of the same vision for the past four years. It didn’t take long to realize that the best outcome would be if we could all work as a single team. I’m glad to say that in a very short time we’ve made that happen and our new expanded team is united on an amazing set of future products.
Akiban builds a SQL database with a novel object-structured design. Using the SQL database interface to abstract a novel data organization has been successful in the past. See, for example, the rise of column-oriented databases which store compressed and normalized columns instead of rows. By adopting this new underlying structure, a huge range of queries became orders of magnitude more efficient than they were when the data was stored as rows.
Akiban noticed that many SQL databases contain multiple tables that are almost always accessed together. These “table groups” often store the logical equivalent of an object or document. Imagine a User table linked to an Orders table linked to a Line Items table. Many web applications, especially those using an ORM, tend to access all of these tables together. Loading an “order history” page, for example, will need to lookup all of the Line Items for each of the Orders. With a traditional row-oriented SQL database, this join operation will involve many slow random database IOs. Akiban’s technology instead stores the data for this table group together, interleaved in its natural hierarchical order, requiring a simple database scan in lieu of lots of random IO. Joins within table groups become free!
Cool technology, but what does it have to do with FoundationDB—a NoSQL database? Well, we think of FoundationDB as more of a storage substrate than a fully featured database. As explained on our layers page, we rely on higher-level tools to provide FoundationDB with traditional database capabilities such as indexing and query languages. Akiban had the foresight to create a strong abstraction between their own underlying storage substrate and their database’s API and capabilities. Amazingly, Akiban choose the abstraction of an ordered, transactional, key-value store for its storage layer—exactly what FoundationDB provides! (OK, it’s not that amazing, we’re seeing more and more new systems built using this abstraction.) What this means is that Akiban’s technology is a natural fit as a powerful layer on top of the FoundationDB engine, and, like all layers, will inherit the exceptional properties of FoundationDB including scalability, fault tolerance, and true ACID transactions.
It would be an understatement to say that we are excited about the possibilities of bringing together these technologies into a SQL layer for our users!
If you want to be the first to get access to this upcoming SQL layer, please register below.
A few weeks ago we posted our Fault Tolerance Demo Video and it got a lot of interest, making it to the top of Hacker News for most of a day. We figured since our demo cluster video was interesting to people, we’d show off an actual fault tolerance testing tool we use here at FoundationDB - the worst, least reliable database cluster we could put together, aptly named Quicksand.
In the video below one of our founders, Nick Lavezzo, shows off the Quicksand cluster and explains how we use it to continuously test FoundationDB in the real world.
A Quick Overview:
a cluster made up of eight consumer grade machines
each of which is running a different brand of SSD and a different flavor of Linux
connected via nine network switches to a “Quicksand Manager” server
The system is set up using the above network topology. This topology is set up strangely so that the Quicksand Manager server’s test cycle can cause many different types of failures by powering off outlets on the power strips. These power interruptions are performed randomly, which allows a limitless number of different failure scenarios to occur. Some common ones:
Powering off any of the database servers.
Powering off any of the “top” network switches - this causes network unavailability for each of the servers connected to it.
Powering off any of the “bottom” network switches - this creates a partition which allows the servers connected to the top switch to still communicate with each other, but not the rest of the server.
Combinations of the above.
The Quicksand Testing Cycle
During each testing cycle, the Quicksand Manager server:
Ensures all machines are turned on.
Performs a clean install of FoundationDB on each machine, and configures them all as one cluster.
Starts pushing client read / write requests to the database, simulating a real-life workload, for 10 minutes.
Randomly “perturbs” the Quicksand cluster machines, for 10 minutes.
Shuts down the cluster, and collects and analyzes the log files.
What does “perturb” mean in this context? The Quicksand Manager server randomly does these things to the cluster during each 10 minute test run:
Power a server machine off and on (randomized duration).
Power a network switch off and on (randomized duration).
Freeze the fdbserver process.
Directly allocate all free disk space on a server (making a server run out of disk space).
Quicksand does multiple of these things during each testing run, randomly.
Analysis & Validation
After each 10 minute testing run, the Quicksand Manager server analyzes its log files, and the contents of the FoundationDB database cluster, to validate that FoundationDB kept its guarantees in two primary areas:
Availability - Based upon its knowledge of which components of the system were un-perturbed at each point in the testing cycle, the Quicksand Manager server builds a model of when the FoundationDB database should have been available to process client requests. The actual availability of the database during the testing cycle is compared to the expected availability.
ACID Compliance - Based upon its knowledge of the transactions that were acknowledged by the FoundationDB database as “committed”, the Quicksand Manager server builds a model of what each key and value in FoundationDB should be. This model is compared to the actual contents of the database.
Results Thus Far
Let’s get to what matters - what the results of our testing with Quicksand have been:
Zookeeper Deemed Unreliable - In our first ever test run of Quicksand, about a year ago, a bug in Zookeeper (which we were using for cluster coordination) caused the cluster to become permanently unavailable. We hunted down the bug that caused the failure, but realized that we didn’t want to have such a critical component untested by our simulation test runs. So…
We built our own in-house Paxos cluster coordinator in Flow, which allows us to subject it to tens of thousands of tests per night, with the rest of our code base. Since implementing this new tool, we have not encountered any bugs with it in the real world.
No violations of our ACID guarantees have been encountered.
No violations of our Availability guarantees have been encountered since replacing Zookeeper.
Overall, the results for us have been to validate in our minds the incredible usefulness of our deterministic simulated testing. Since the only bugs we encounter in simulation are one in 5 million run occurrences (or bugs in freshly committed code that are quickly identified), it makes sense that we would not have encountered any bugs in the real world. This gives us, and hopefully our users, great peace of mind about the strength of FoundationDB’s guarantees and its overall quality of engineering.
Let’s take a look at the string interning layer, one of our example layers available onGitHub. Each of these simple layers supports new functionality with a small amount of code that is easy to inspect, extend, or modify.
Layers add a wide variety of capabilities to FoundationDB, such as new data models, high contention data structures, and compatibility with legacy systems. The string interning layer is an example of a utility, a lightweight layer that solves a specific task with a simple API.
String interning allows string values to be associated with unique aliases. For example, “Washington, DC” might map to “aa” and “San Francisco, CA” might map to “ab”. Instead of storing the string, you store both the alias and the information of which strings map to which representations. This is very similar to how you would use an extra table in a relational database to normalize data.
Interning can be a valuable compression technique by allowing applications to de-duplicate strings while storing values that are still comparable. It can also yield a performance improvement for applications that frequently compare lengthy strings. In this example layer, by storing the interned strings in the database, multiple clients can concurrently share and access a common set of strings.
The layer randomly generates the interned representation of a new string by trying small random representations. If it can’t find a free representation, it tries longer and longer alternatives. The layer maintains both a map from strings to identifiers and a map from identifier to string, storing both these maps as key-value pairs in the database.
Because a string’s interned representation can never be changed, the layer can also keep a local cache of these maps which is completely consistent. The most recently used string are cached in memory and the cache size is kept within a size limit (CACHE_LIMIT_BYTES).
Here’s the logic to add a known string/representation pair (s/u) to the cache:
def _add_to_cache(self, s, u):
while self.bytes_cached > StringIntern.CACHE_LIMIT_BYTES:
if u not in self.uid_string_cache:
self.string_uid_cache[s] = u
self.uid_string_cache[u] = s
size = (len(s) + len(u)) * 2
self.bytes_cached += size
FoundationDB makes it easy to maintaincache coherence. You just update the database in a transaction and update the cache only after the transaction succeeds.
def _intern_in_db(self, tr, s):
u = tr[self.string_key(s)]
newU = self._find_uid(tr)
tr[self.uid_key(newU)] = s
tr[self.string_key(s)] = newU
def intern(self, tr, s):
if s in self.string_uid_cache:
u = self._intern_in_db(tr, s)
Multiple clients can concurrently intern strings and look up identifiers, and FoundationDB’s transactional properties guarantee that the same values are always returned for a given string or identifier. Caches for all clients will remain consistent with both the database and each other.
FoundationDB’slayer concept allows us to implement multiple-client string interning with caching in a small amount of code. Our example layers are written using ourPython API, but you can also write layers inRuby,Node,Java, orC. Next, we’ll take a look at the blob layer.