By mapping documents, graphs, and relational tables to a collection of keys and values, a single data store can support multiple data models
NoSQL entered the scene nearly six years ago as an alternative to
traditional relational databases. The offerings from the major
relational vendors couldn’t cut it in terms of the cost, scalability,
and fault tolerance that developers need to build reliable, modern Web
applications.
Flash forward to today, and now vendors everywhere tout their NoSQL
solutions. Open source projects have sprouted all over the place with
thousands of developers contributing to them. In fact, more than 200
different NoSQL products and companies in the market are vying for
developers' attention.
Beyond SQL
To understand the NoSQL boom, it helps to take a quick look at how we
got here. The relational data model has been around since the early
1970s, and it became popular for good reasons. In the form of SQL, it
offers a general query language based on a rigorous data model. Query
planners can usually optimize queries without requiring detailed
knowledge of the physical data layout on the part of the user.
Since that time, many data formats that go beyond the relational model
have gained popularity. For example, JSON is a common format used in
software development and for document-oriented data. Some SQL vendors
allow you to store JSON as a serialized string, but it’s not a
first-class citizen in terms of querying or indexing. You can decompose a
JSON object into multiple tables, but you’ll have to use multiple joins
to query the data, paying a large performance penalty.
Graphs, based on nodes and links, are another popular data model. Graphs
are often used to store data structured as networks, such as social
networks. As with JSON, there are straightforward ways to translate a
graph into relational tables, but the relevant graph structure is lost,
and the resulting queries require expensive, iterated joins. As a
result, the likes of a shortest path query, which is natural and
straightforward in a graph data model, become extremely complex in SQL.
Other data models, like time series, blobs, and geospatial data, pose
similar challenges. Many SQL vendors support proprietary add-ons for
these data types, but there is no uniform and efficient way to represent
them in a relational model.
The NoSQL response
The proliferation of NoSQL databases is a response to the needs of
modern applications, which work with different types of data with
different storage requirements. Not all data can be shoehorned into a
particular model, whether relational or otherwise. That’s why there are
so many different database options in the market. The need for multiple
data models in modern applications is our reality.
While developers need multiple data models, they shouldn’t have to adopt
different databases to get them. It’s not uncommon to hear that an
application has multiple databases in its back end. Martin Fowler has
advocated an architectural pattern of “polyglot persistence,” meaning
the application stores data in separate databases of different types.
Polyglot persistence with separate databases responds to a real need,
but it leads to an operational nightmare. Running multiple data silos
creates as many problems as it solves, beginning with operational
complexity.
One back end, multiple data models
A multimodel database provides a single back end that supports multiple
data models. It’s all about being able to choose the best data model for
the job with a single storage substrate. Multimodel databases eliminate
the back-end fragmentation already discussed. Multimodel databases
provide two key benefits:
Easing operational complexity. The fragmented
environments caused by running different databases side by side increase
the complexity of both operations and development. For example, a
polyglot application stack might include Redis as a caching layer,
MongoDB for collecting logs, Postgres for metadata, and Elasticsearch
for indexing and search. The goal is to use the best component for the
job.
However commendable the intention, polyglot persistence means you end up
with multiple databases, each with its own storage and operational
requirements; integrating them for scalability and fault tolerance is up
to you. Assuring that a system with many such components is
fault-tolerant is challenging, to say the least. The need to integrate
multiple databases imposes significant engineering and operational
costs. Your team needs to have experts in each database technology. For
the application to stay online, all of the databases need to remain up.
This renders the fault tolerance of the application equal to the weakest
link in the stack.
Consistency. Even worse, there is no support for
transactions across different databases, so there is no good way to
maintain consistency among different models. Suppose your application
receives a stream of data on user activity, and you decide to store
related data elements in time series, graph, and document stores. You
usually require these elements to reflect a consistent state, but
without ACID transactions, this requirement can be difficult if not
impossible to achieve.
A new approach
Can we somehow keep the good parts of polyglot persistence but lose the
bad parts? It turns out, we can. The main idea is to keep all state in a
single store that supports multiple data models by mapping the
higher-level models to a lower-level representation. To pull off this
trick, the storage substrate needs to have some important properties. At
a minimum, it needs to support true, multikey ACID transactions in a
performant manner. It turns out that ordering among keys is another
important tool for efficient data modeling. These considerations lead to
an ordered, transactional key-value store as the basic storage substrate we’ll need.
With these building blocks in place, supporting new data models becomes a
matter of mapping the higher-level representation to a collection of
keys and values. JSON documents, graphs, and relational tables can all
be efficiently mapped to key-value pairs. By taking advantage of the
ordering property among keys, we can even design optimizations that let
us avoid joins in many cases where a traditional SQL database would
require them.
This new approach gives you several big wins. All of the states that
your application stores can be kept in a single component. You can have
transactions across your data models. Best of all, you can use the data
models your application really needs. ACID transactions give you the
“glue” to keep all your data synchronized across different models.
Of course, to meet the demand of modern applications, these features
need to be delivered in a way that retains the advantages of a NoSQL
database running on a distributed cluster, especially horizontal
scalability and fault tolerance. ACID transactions are especially
powerful in combination with those capabilities.
A multimodel database
Rather than having to integrate multiple databases, it’s much simpler if
your development team can build the data models you need on a single
back end. That’s the approach FoundationDB has taken. FoundationDB is a
multimodel database that combines scalability, fault tolerance, and high
performance with incredibly powerful multikey ACID transactions. The
“secret sauce” that enables this approach is performant ACID
transactions. Building a custom data model that supports concurrent
updates usually requires synchronization of disparate data elements.
Such synchronization is easy with ACID transactions and very difficult
without them.
This is where the database market is heading: toward ACID-compliant,
multimodel databases that can meet an application’s requirements for
fault tolerance, scalability, and performance.
Source: http://www.infoworld.com
No comments:
Post a Comment