This is a series of articles dedicated to the optimal choice between different systems in a real project or an architectural interview.
In this series of articles, I will not talk about the deep beginnings of these technologies, I will not analyze every detail, but I will focus on how to correctly submit these technologies to an interview and choose the right and shortest path that convinces the interviewee that he has made the best decision in his particular situations.
This topic seemed relevant to me because such tasks can be found both at work and in an interview for the system design interview and you will have to choose between these two types of DBMS. I dove into this topic and will tell you what and how. What is better in each case, what are the advantages and disadvantages of these systems and which one to choose, I will show with several examples at the end of the article.
To understand the topic, you need to dive a little into the structure of these databases.
High-level differences between SQL and NoSQL
Storage:
SQL stores data in tables where each row represents an entity and
each column represents information about that entity.
The most popular SQL databases are PostgreSQL (most likely, you will choose it in an interview in 80% of cases, because it is one of the most common and easy to learn), Microsoft SQL, Oracle Database (both are ideal for large enterprise systems, such as banks) and MySQL.
NoSQL has different data storage models, such as key-value, chart, document, columnar, etc.
The most popular NoSQL by storage type that you are likely to choose in an interview and about which you should read at least a couple of articles
:
Key-value: Redis and Amazon DynamoDB
.
Document oriented: MongoDB.
Graphic: Neo4j, GraphDB
Columnar: Apache HBase, Apache Cassandra
Most likely, the interview will have such a task in which you will need to choose the first 2 types (unless, of course, the preference is NoSQL), so you should study them in more detail and write some small project using, for example, Redis and MongoDB.
Different metrics for different non-relational databases
Schema:
A schema in a
SQL database refers to the organization or structure of data in the database. It defines the tables, fields, relationships, and constraints that make up the database and serves as a model for how data should be organized and stored. Schemas can be used to enforce data integrity and consistency, and they can also help manage access to data by defining which users or roles have permission to perform certain actions.
In a
NoSQL database, the schema concept is often less rigid and structured compared to a SQL database. NoSQL databases can be schemaless, which means that the structure of the data being stored can change dynamically and does not need to be defined in advance.
However, some
NoSQL databases have a schema form, but it can be more flexible and dynamic than in SQL databases. For example, in a document-oriented NoSQL database like MongoDB, each document can have its own unique set of fields, and the data structure can change from document to document. In a key-value store such as Redis, each key-value pair can have a different data type, so the structure of the data is implicit and does not need to be explicitly defined.
The specific schema of a NoSQL database depends on the type of database and the design choices made by the developer or architect.
Consultations:
For queries to SQL databases, respectively, the SQL language is used with various modifications of database
developers, for convenient linking of programming languages with SQL databases, ORM is used, in the case of Java, this is Hibernate, Spring Data
There is no specific language for queries in the NoSQL database, it all depends on the type of database described above, use of specialized APIs, structured declarative queries, and Query-by-Example
queries
Examples of queries in SQL and NoSQL databases MongoDB:
In this query, we are looking for contacts who have at least one work phone number or who have been hired after a certain date. Again, we see that the MongoDB equivalent is pretty simple.
Note the use of dollar signs in traders’ names (‘$or’, ‘$gt’) as syntactic sugar. Also note that in both examples it is important to use an actual date in our comparison, not a string.
Scalability:
In most cases, SQL databases are vertical and this can be costly to the budget. You can scale such databases horizontally, but this requires some skill, knowledge, and details of the database you work with, otherwise there is a risk of losing data. On this occasion, you can read in detail about partitioning, partitioning, and replication.
But with NoSQL databases, scaling out works best. NoSQL databases scale horizontally, which means we can easily add additional servers to our NoSQL database infrastructure to handle heavy traffic.
Many NoSQL
technologies also automatically distribute data between servers. Why is NoSQL easier to scale? Because there is no JOIN operation. Therefore, JOIN operations scale poorly, and this is the fundamental problem with the relational approach. However, this (in most cases) also lacks ACID properties, which can lead to data loss, inconsistencies, and other disadvantages of non-ACID systems.
Reliability:
Most relational databases use ACID properties. This means that when it comes to data integrity and transaction reliability, databases are the best choice.
On the other hand, non-relational databases sacrifice ACID for the sake of rapid scale and high performance, so such databases should be used in places where data security and transactionality should not be 99.9%.
Let’s summarize what and when to use.
SQL if you need:-
Structured
data with strict schema- Relational data- The need for complex
joins (use of subqueries, join
)-
Transactions
–
Search by
NoSQL index if you need:
–
Dynamic or flexible schema
–
Non-relational
data-
No complex data connections needed
– High
intensity work with data
–
High performance for IOPS (standard IOPS)
4 examples of what is optimal to choose under different conditions in the problem
Example 1: Library Management System
Imagine you’re building a library management system where you need to keep track of books, authors, and library members, as well as boxes, returns, and fines. In this case, it would be more appropriate to use a SQL database because
:Data structure
:
Data can be easily structured into tables for books, authors, library members,
-
boxes, chargebacks, and fines, with relationships between them. Books, authors, and library members can be represented in tables with columns
-
for relevant information. For example, the book table might have columns for the book’s title, ISBN, author, and publication date.
-
Relationships between tables can be established through foreign keys. For example, the book table might have a foreign key to the authors table, which would link each book to its author.
-
Payments, chargebacks and penalties can also be represented in tables with columns for relevant information. For example, the payment table might have columns for the library member, book, and payment date.
Data consistency and accuracy
:
-
Data consistency and accuracy are important, as the library needs to keep track of accurate information about its books, authors, and members, as well as boxes, returns, and fines. SQL databases provide ACID transactions to ensure that all database updates complete successfully or are rolled back.
-
ACID transactions ensure that all database updates complete successfully or are rolled back in the event of a failure. This means that the library management system can handle multiple requests to check out the same book at the same time without data loss or corruption.
Complex queries:
-
SQL databases provide a rich query language, enabling the library management system to generate reports and analyze data. For example, the library management system can generate a report of all boxes for a particular author in the last year using a query such as
:SELECT books.title, checkouts.checkout_date FROM books JOIN checkouts ON books.book_id = checkouts.book_id WHERE books.author = “Jane Austen” AND checkouts.checkout_date >= DATE_SUB(NOW(), INTERVAL 1 YEAR)
Data integrity:
-
Data integrity is crucial, and the system must enforce rules to ensure that every payment is recorded against the correct book, and every fine is recorded against the correct library member. SQL databases have constraints and triggers that these rules can apply.
-
Constraints and triggers can be used to apply rules and ensure that data is entered correctly. For example, a restriction can be used to ensure that a library member cannot check out more than 5 books at a time. A trigger can be used to automatically update the penalty table whenever a late payment is returned.
Moderate
volume of writes and reads: The library management system should handle a moderate volume of writes and reads, and SQL databases are
-
adequate to handle this type of workload. SQL databases are suitable for handling a moderate volume of writes and reads
-
. The library management system can handle multiple requests to check out and return books, as well as generate reports, without any performance issues.
In conclusion, in this scenario, a SQL database such as MySQL or PostgreSQL would provide a robust, scalable, and flexible solution for the library management system.
Example 2: Social Media App
Imagine you’re building a social media app where users can post updates, comment on posts, and like posts. In this case, it would be more appropriate to use a NoSQL, let’s extend the example to see how a NoSQL database would work
:Data structure
:
-
Data for each user, post, comment and the like can be stored as separate documents. For example, the publishing document might contain the text of the publication, the user who posted it, and the date and time it was published. The comment document could contain the text of the comment, the user who made the comment, and the date and time it was made. The Like document might contain the user who liked the post and the date and time they liked it.
-
Relationships between posts, comments, and likes can be stored as references or embedded in documents. For example, the post document might contain a number of references to the comments and likes associated with that post.
Data consistency and accuracy
: The
-
NoSQL database can handle final consistency, meaning it can take some time for data to be updated across all nodes in the cluster. This is acceptable in a social media app, where consistency of real-time data is not critical.
Complex queries:
-
NoSQL databases may not have as rich a query language as SQL databases, but they are designed for fast data retrieval. For example, the social media app can retrieve all of a particular user’s posts with a query like
:d b.posts.find({user:”Jane Doe”})
-
NoSQL databases can also use MapReduce for complex data analysis and processing.
Data integrity
: the databases
-
NoSQL may not have the same application level of data integrity as SQL databases, but they can still apply data constraints through application code and logic. For example, the social media app can apply a rule that a user can’t like their own post by checking the user ID in the similar document before allowing it to be saved.
High volume writes and reads: NoSQL
databases
-
are optimized to handle a high volume of writes and reads, making them a good choice for social media applications. The social media app can handle multiple updates, comments, and likes in real-time without any performance issues.
Scalability:
-
NoSQL databases are designed for horizontal scaling, allowing the application to add more nodes as needed to handle the increasing load. The social media app can handle an increasing number of users and posts without any performance degradation.
In conclusion, in this scenario, a NoSQL database such as MongoDB, Cassandra, Redis or Aerospike would provide a scalable, fast and flexible solution for social media application.
Example 3: Records Management System
Imagine that you are building a records management system where you need to collect and process large volumes of log data from multiple sources. In this case, it would be more appropriate to use a NoSQL database because:
-
The data is unstructured and can change frequently, as new record sources are added or log data formats change. NoSQL databases have a flexible schema that can easily accommodate these changes.
-
The system needs to handle a large volume of writes, as log data is continuously generated. NoSQL databases are designed to handle high write loads.
-
The system must be scalable to accommodate growing volumes of log data, and NoSQL databases can easily scale out by adding more nodes to the cluster.
A log management system may use a NoSQL database such as Apache Cassandra or Apache HBase, which are designed to handle high write loads and have a flexible schema to accommodate changing log data formats.
Example 4: Banking System
Imagine you’re building a banking system where you need to keep track of customer accounts, transactions, and loans. In this case, it would be more appropriate to use a SQL database because:
-
The data is easily structured into tables for customers, accounts, transactions, and loans, with relationships between them
-
Data consistency and accuracy is important, as financial transactions must be recorded correctly. SQL databases provide ACID transactions to ensure that all database updates complete successfully or are rolled back.
-
The system needs to support complex queries to generate reports or analyze data. For example, you might want to generate a report of all transactions for a particular customer in the last year.
-
Data integrity is crucial, and the system must enforce the rules to ensure that every transaction is recorded in the correct customer account.
.
In this scenario, a SQL database such as Oracle or PostgreSQL would be a good choice as they are widely used and well established for managing structured data with complex relationships.
In the next article I will tell you how to choose between different message brokers.