In part 2 of the series “how to model data”, we answer the question “What are the different types of data models?” Take a look at several logic models, examples of data models, their strengths and weaknesses, and their pros and cons.
Part 1 of this series covered the three steps of data modeling: conceptual, logical, and physical.
Common Data Models
After defining a conceptual model, the next step is to start creating a logical data model. This is still too early to choose a specific technology or vendor.
The logical data models I find most common and popular include: relational, JSON (document), key-value, chart, wide column, and text search.
The relational
data model was proposed in 1970 by E.F. Codd. Broadly speaking, the relational data model consists of tables that contain rows and columns, which contain data that follows predefined schemas and constraints.
The term “relational” does not refer to the relationships between tables and entities, but rather to the theoretical concept of a set of tuples that are members of a data domain, each of which contains attributes.
Real relationships are not part of the relational model (or any other model except Graph, which will be discussed later), but are usually implemented with multiple tables and foreign key constraints.
Relational is the world’s most popular type of logical data model.
SQL was not part of the original relational proposition, but is strongly (but not exclusively) associated with relational databases. In fact, SQL is generally the only way to interact with data in relational databases.
JSON Data Model
Most document databases store data as JSON (XML and other representations are much less popular
).
Highly scalable NoSQL implementations bypass relational constraints in favor of separate data that can be distributed across multiple servers (across clusters and partitions). JSON is a popular format for this type of distributed data because:
It can
-
- provide a complex structure and a “schema in read”.
- It is human-readable
- There are (de)serializers for JSON in almost every programming language and platform
- It can support SQL++, a backward-compatible implementation of SQL for JSON data models.
.
.
If you come from a relational environment, but want to explore the benefits of “NoSQL” (scaling, flexibility, performance, familiarity), then
a JSON document model is a great place to start.
Data access methods for document databases typically include (but are not limited to) a query language (a standard such as SQL++ or a proprietary provider implementation such as MQL) that you can filter based on individual JSON values.
A key-value
data
model associates each piece of data with a key. The actual value can be in any format. It could be JSON, XML, binary, text, or anything else.
These types of database models are great, with examples of certain use cases like caching, session storage, user profile, shopping carts.
Beyond key-value
reads and writes, functionalities such as “subdocument” and “map/reduce” add more data access options, but using a pure key-value logical model on data with complex relationships can be very difficult.
Chart Data
Model
A chart data model is the only model that has built-in data concepts and relationships. In a graph model, a piece of data is called a “node” and a relationship between two nodes is called an “edge”.
A graph data model can be useful for modeling a complex system of relationships. For example, modeling a network of physical computers and the computers that are on the network. It is often used for specialized use cases such as fraud detection and purchase recommendations.
For queries, graph databases use a specialized graph query language such as Gremlin or Cypher.
Note: GraphQL and the chart data model are not related.
A
wide column data model has tables (sometimes called “key families”) that are similar to relational tables, but are based on columns rather than
rows. This means that any given row can have an arbitrary number of columns, different from other rows. In addition, columns can be continuously added to existing rows.
Each row has a key, so you can also think of this model as a 2D evolution of a key value store.
Query languages for wide-column data models are database-specific. A popular example is CQL (Cassandra Query Language), which looks like SQL, but is a very limited subset.
Full-text search data
model
A full-text search data model is optimized for storing and searching text. This includes facets and blurring, but can also include geographic search.
Creating indexes is a key part of interacting with full-text search. Searches provide parameters to active indexes and are often performed using REST requests or a specialized SDK. The types of queries that can be used include string, match, phrase, compound, range, and geospatial.
Data modeling examples
Multi-model databases are those that can support multiple types of the data models listed above with the same dataset. That said, here are some examples of physical data modeling examples with deployment tools (databases) for each model
:
-
- Relational: SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, SQLite, Microsoft Access, Snowflake
- Key-value: There are very few pure key-value databases, but examples that have good key-value support: Memcached, Redis
- Wide column: Cassandra, HBase,
- CosmosDB*.
- Text/Search: Elasticsearch and Solr (Lucene), Couchbase (Bleve)
. JSON document: Couchbase, CosmosDB*, DynamoDB, CouchDB (similar name, but different and separate from Couchbase), MongoDB.
, Couchbase. Graphic: Neo4j, CosmosDB*, ArangoDB.
*Please note that you must choose a data model in advance with CosmosDB.
Data
modeling tools Data modeling
doesn’t require any tools, and can be done on a dry erase board or with pen and paper (with the exception of physical data modeling, of course). However, there are many data modeling tools that can make your team’s work easier. Here are some of the most popular:
JetBrains DataGrip provides a database diagramming tool for building a model from an existing database. These diagrams can be saved in UML or PNG. This is ideal for modeling physical data.
Erwin Data Modeler is a very popular tool that supports conceptual, logical, and physical data models. Quest, the creators of Erwin, have partnered with Couchbase for modeling and migration use cases.
Hackolade is another data modeling tool that supports a variety of data models (including all the physical models mentioned in this post) as well as data model migration.
Idera has a complete set of tools for modeling. For example, ER/Studio Business Architect provides conceptual modeling and ER/Studio Data Architect provides logical and physical modeling.
If these tools seem excessive to begin with, remember that you can use a more general-purpose diagramming tool, such as Diagrams.net for your team to work collaboratively.
Next steps and resources
Conceptual, logical and physical models were introduced in the previous publication. This post took a closer look at logical data model options. The next (and final) post in this series will dive specifically into physical modeling of JSON data.
In the meantime, check out these resources to learn more:
Comparison of document databases and relational
-
- databases
- End the relational vs NoSQL debate, once and for all
- More blog posts on data modeling