Loads

Parameters for the load can be:

  • Requests per second to a web server
  • ratio of reads/write per seconds to database
  • Number of simultaneous users in a chat room
  • hit rate on cache

Database

The read/write performance of a database heavily depends on the data structure that it uses to store data. We consider that every record is of key/value type and evaluate different types of data structures used by databases.

Log-based (Append Only)

Some databases use the append-only mechanism. They use a log file that is appended on write operation. They have a very good performance for write operation but are very slow for the read. For a read operation, the entire log has to be traversed to find the value.

There is an issue of disk space also with log-based database as every write to the same key will also increase the size. One possible solution for this problem is Compaction. There can be a compaction service running that can go through the logs and entries for the same key. The service can then retain the latest change and discard all the previous changes resulting in efficient usage of disk space.

Compaction

The compaction can be performed on a separate thread without disturbing the read/write operation of the database. Each log file can be viewed as a data segment. We never update any existing data segment but create new ones.

  1. Existing data segments can be loaded in a memory hash map.
  2. Same keys from different segments can be looked up efficiently (using hashmap) and merged.
  3. We create new segments while combining the keys. Meanwhile, the existing read/write operation can be performed using the existing data segment.
  4. Once the merging is completed, we can switch to the new data segment and perform read/write operations from the new segment.

Index

Relational databases maintain indexes on the key. They have high-speed reads, but writes are very slow as every write will update the index.

  • Hash Index: Some databases maintain a hash map in memory to store the data. The primary condition of this type of database is to have a limited number of keys that can fit into memory.

Replication

Replication is keeping the same data at different nodes that are connected via a network. Databases generally store data on multiple replicas. When a write operation is performed, the data should be replicated across all replicas. The most common solution is leader-based replication.