A one-to-few relationship exists where related data is bounded, small, and frequently read together.
→Embed the related data as a nested object or array within the main document.
Why: Optimizes read performance by retrieving all necessary data in a single point read, minimizing RU cost and latency. Avoids client-side joins.
Reference↗
A one-to-many relationship where the "many" side grows unboundedly or is updated independently of the "one" side.
→Store related items as separate documents and use the parent document's ID as a reference.
Why: Prevents documents from exceeding the 2 MB size limit and avoids high RU costs for updates on large embedded arrays.
Reference↗
A document contains an array that can grow unboundedly over time, risking the 2 MB document size limit (e.g., event logs, comments).
→Split the array across multiple "bucket" documents. When a bucket reaches a size/item threshold, create a new one.
Why: Keeps individual document sizes manageable while maintaining the logical grouping of related data.
Modeling a many-to-many relationship, such as students and courses, or articles and tags.
→For bounded relationships, duplicate relationship data on both sides (e.g., embed course IDs in student doc, student IDs in course doc). For unbounded, use a separate "join" or "edge" document container.
Why: Denormalization optimizes for both query directions (students in course, courses for student) without requiring joins. A join container is for unbounded cases.
Modeling hierarchical data (e.g., organizational chart, product categories) and needing to query for all descendants of a node.
→Store an array of all ancestor IDs or names (the path) in each document.
Why: Enables efficient subtree queries with a single `ARRAY_CONTAINS` filter, avoiding costly recursive lookups.
A document has an unbounded array (e.g., blog comments), but the most common query only needs the most recent N items.
→Embed a subset of recent items in the main document and store all items as separate referenced documents.
Why: Optimizes the primary read path for performance and cost, while still allowing access to the full dataset when needed.
Storing a sequence of immutable events for an entity and needing to query for current state or analytical aggregates.
→Store events in a single container partitioned by the entity ID. Use Change Feed or Synapse Link to compute and store materialized views or aggregates.
Why: Provides a complete audit trail and decouples the write model from various read models, offering high flexibility.
Need to preserve the state of related data at a specific point in time (e.g., a customer's address on an order).
→Embed a copy (snapshot) of the related data in the document, rather than referencing it.
Why: Ensures historical accuracy by decoupling the document from future changes to the referenced data.
Ingesting high-frequency time-series data (e.g., IoT sensor readings) and querying by device over time ranges.
→Use device ID as the partition key. Aggregate readings into time-bucketed documents (e.g., hourly or minutely) instead of one doc per reading.
Why: Drastically reduces document count and write RUs, while co-locating data for efficient time-range queries within a partition.
Need to perform multiple create, update, or delete operations as a single atomic transaction.
→Use the SDK's TransactionalBatch feature. All operations must target the same logical partition key.
Why: Provides ACID guarantees for up to 100 operations within a single partition, ensuring that either all operations succeed or all fail together.
Documents should be automatically deleted from a container after a specific period (e.g., 30 days).
→Enable Time to Live (TTL) on the container and set the default `ttl` value in seconds (e.g., 2592000 for 30 days). A `ttl` of -1 on an individual document overrides the default and prevents expiration.
Why: TTL is a no-cost feature that uses leftover RUs to perform background deletions, providing an efficient, hands-off way to manage data lifecycle.
Need to store large binary objects (images, videos, documents > 2 MB) associated with Cosmos DB metadata.
→Store the binary object in Azure Blob Storage. Store the URI to the blob in the Cosmos DB document along with the metadata.
Why: Cosmos DB is optimized for structured metadata and has a 2 MB document limit. Blob Storage is a cost-effective and scalable service for large object storage.