Hôm trước mình bất ngờ gặp được câu hỏi từ một a.e DEV "DynamoDB thuộc Key-value stores hở, tưởng giống Document databases như mongodb ?". Cũng hơi giật mình phân vân, nên mò mẫn tìm hiểu được 1 chút, share để mọi người được nắm rõ hơn 😁
Document databases
Example : MongoDB
Trong Document databases, dữ liệu được lưu trữ dưới dạng các tài liệu có cấu trúc tương tự như JSON, với mỗi tài liệu có khả năng chứa một tập hợp các cặp key-value.
Document databases have the following key features:
- Document model: Dữ liệu được lưu trữ trong các documents (khác với các cơ sở dữ liệu khác lưu trữ dữ liệu trong các cấu trúc như tables hoặc graphs). Documents tương ứng với các object trong hầu hết các ngôn ngữ lập trình phổ biến, cho phép các nhà phát triển phát triển ứng dụng của họ một cách nhanh chóng.
- Flexible schema: Document databases have a flexible schema, meaning that not all documents in a collection need to have the same fields. Note that some document databases support schema validation, so the schema can be optionally locked down.
- Distributed and resilient: Document databases are distributed, which allows for horizontal scaling (typically cheaper than vertical scaling) and data distribution. Document databases provide resiliency through replication.
- Querying through an API or query language: Document databases have an API or query language that allows developers to execute the CRUD operations on the database. Developers have the ability to query for documents based on unique identifiers or field values.
Key-value stores
Concept
A key-value database is a type of non-relational database, also known as NoSQL database, that uses a simple key-value method to store data. It stores data as a collection of key-value pairs in which a key serves as a unique identifier. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value databases (or key-value stores) are highly partitionable and allow horizontal scaling at a level that other types of databases cannot achieve.
This type of NoSQL database implements a hash table to store unique keys along with the pointers to the corresponding data values.
💡 Hash Function:
h(key) -> location
Given a key, the hash function h
calculates the location within the data structure where the corresponding value is stored. This process is usually O(1) in complexity.
def search_in_kvs(kvs, key): location = h(key) return kvs[location]
Ordered Key-Value Store
An OKVS will keep the key-value pairs sorted by the key lexicographic order
Ordered Key-Value Stores typically have a fixed schema for keys but can have flexible values. The ordering of keys provides structure and allows for efficient range queries
Some usecase is aws dynamodb.
In a typical Key-Value Store, data is stored without any inherent order. To search for a specific key, a simple hash function is often used to compute the location of the key within the data structure. Let's represent this process mathematically:
💡 Binary Search Algorithm
Given that the data is sorted, a binary search algorithm can be used to find the location of the key within the data structure. The binary search algorithm operates by repeatedly dividing the search interval in half until the key is found or the interval is empty
def binary_search(okvs, key): left = 0 right = len(okvs) - 1 while left <= right: mid = (left + right) // 2 if okvs[mid].key == key: return okvs[mid].value elif okvs[mid].key < key: left = mid + 1 else: right = mid - 1 return None
Here, the binary search algorithm has a time complexity of O(log n), where n is the number of elements in the OKVS. Since the data is sorted based on keys, this algorithm efficiently finds the desired key or determines its absence.
Difference
- Data Model:
- Document databases store semi-structured or structured data in documents, usually using formats like JSON or BSON. Each document contains key-value pairs, where values can be simple data types, arrays, or nested documents.
- Key-value stores store data as a collection of key-value pairs, where the value is typically opaque to the database.
- Ordered Key-Value Stores maintain data in a sorted order based on the keys, allowing for efficient range queries in addition to key-based retrieval.
- Querying:
- Document databases typically support advanced querying capabilities, allowing for querying based on the contents of the documents, including filtering, sorting, and aggregation.
- Key-value stores usually support basic operations like PUT, GET, and DELETE based on the key. They lack advanced querying capabilities beyond simple key-based retrieval.
- Ordered Key-Value Stores support range queries in addition to key-based retrieval, enabling efficient retrieval of data within a specified range of keys.
- Schema Flexibility:
- Document databases offer schema flexibility, allowing documents within the same collection to have different structures.
- Key-value stores provide the highest level of schema flexibility since they do not enforce any structure on the stored values.
- Ordered Key-Value Stores typically have a fixed schema for keys but can have flexible values. The ordering of keys provides structure and allows for efficient range queries.
- Performance:
- Key-value stores prioritize high-throughput, low-latency operations, making them suitable for scenarios requiring fast read and write operations.
- Document databases may have slightly more overhead due to indexing and querying capabilities but still offer good performance for many use cases.
- Ordered Key-Value Stores provide efficient range queries, making them suitable for scenarios where data needs to be retrieved within a specific range of keys.
- Use Cases:
- Document databases are suitable for applications where data has a hierarchical structure or where flexible schemas are required.
- Key-value stores are commonly used for simple data storage and retrieval tasks such as caching, session management, and storing user preferences.
- Ordered Key-Value Stores are useful in scenarios where data needs to be accessed in a sorted order or retrieved within specific ranges, such as time-series data, logs, and leaderboard systems.