Playbook

Microsoft Azure Data Fundamentals

Last reviewed: May 2026

A scannable reference of architectural patterns the DP-900 exam tests. Read top-to-bottom, or jump to a section.

Describe core data concepts

Data is organized in a fixed tabular layout with a predefined schema (rows and columns), such as a product catalog or financial records.

Represent as Structured Data.

Why: Structured data conforms to a rigid schema, ideal for relational databases (OLTP). Contrast with semi-structured (JSON/XML) and unstructured (images/audio).

Data has some organizational structure (tags, keys) but lacks a rigid schema. Each record can have different fields, such as IoT sensor JSON documents.

Represent as Semi-structured Data (e.g., JSON, XML).

Why: JSON and XML are self-describing, offering flexibility over the fixed schemas of structured data. Ideal for NoSQL databases and data lakes.

Storing large files with no predefined schema or organizational structure, like MRI scans, videos, or audio recordings.

Represent as Unstructured Data.

Why: This data type cannot be stored in traditional row/column databases. Requires object storage like Azure Blob Storage.

Differentiate between workloads for day-to-day operations versus historical analysis.

Use OLTP (Online Transaction Processing) for high-volume, low-latency transactions (e.g., e-commerce orders). Use OLAP (Online Analytical Processing) for complex queries over large historical datasets (e.g., sales trend analysis).

Why: OLTP systems are normalized and optimized for fast writes. OLAP systems are denormalized (star schema) and optimized for fast reads and aggregations.

Choose a data integration pattern for a data warehouse.

Use ETL (Extract, Transform, Load) when transformation logic is complex and done on a staging server before loading. Use ELT (Extract, Load, Transform) to load raw data into a powerful target system (e.g., Synapse Analytics) and leverage its compute for transformations.

Why: ELT is the modern cloud pattern, leveraging scalable compute in the target data store (data warehouse/lakehouse) and simplifying ingestion.

Assigning responsibility for data platform tasks.

Data Engineer: Builds and maintains ETL/ELT pipelines. Database Administrator: Manages database security, performance, and availability. Data Analyst: Creates reports and visualizations (e.g., Power BI) for business insights.

Why: Clearly defined roles are essential. The key distinction is build (Engineer), manage (DBA), and analyze (Analyst).

Processing large volumes of data with different latency requirements.

Use Batch Processing for data at rest, processed at scheduled intervals (e.g., nightly reports). Use Stream Processing for data in motion, processed continuously as it arrives (e.g., real-time fraud detection).

Why: The key tradeoff is latency vs. cost/throughput. Stream processing provides low latency but requires always-on resources. Batch processing has high latency but is cost-effective for large volumes.

Designing a schema for a data warehouse to support analytical queries.

Use a Star Schema consisting of a central fact table (containing numeric measures) connected to multiple dimension tables (containing descriptive attributes).

Why: This denormalized structure minimizes joins for analytical queries, improving performance compared to a normalized (OLTP) schema. It is simpler and faster for most BI tools than a snowflake schema.

Choosing a central repository for analytics.

Use a Data Lake (e.g., Azure Data Lake Storage) to store vast amounts of raw data in its native format (schema-on-read). Use a Data Warehouse (e.g., Synapse Dedicated SQL Pool) to store structured, processed data for BI and reporting (schema-on-write).

Why: Data lakes offer flexibility for data science and exploration of raw data. Data warehouses provide high performance and structure for business intelligence.

Describe how to work with relational data on Azure

Need a fully managed relational database for a new cloud-native application without managing underlying infrastructure.

Use Azure SQL Database.

Why: It is a PaaS offering with automatic patching, backups, and high availability. Ideal for standard SQL workloads where OS-level access is not needed.

Reference

Lift-and-shift migration of an on-premises SQL Server workload that uses instance-scoped features like SQL Server Agent, cross-database queries, or Service Broker.

Use Azure SQL Managed Instance.

Why: SQL MI provides near-100% compatibility with the on-premises SQL Server engine, minimizing migration changes. Azure SQL Database does not support these instance-level features.

Reference

Migrating a SQL Server database to Azure requiring full control over the OS, specific SQL Server versions, or features with limited PaaS support (e.g., certain CLR assemblies).

Use SQL Server on Azure Virtual Machines.

Why: This IaaS option provides maximum compatibility and control, but requires the user to manage the OS, patching, and backups, unlike PaaS offerings.

An application has intermittent, unpredictable usage patterns with long idle periods. Need to minimize costs during inactivity.

Use the Serverless compute tier for Azure SQL Database.

Why: Serverless automatically scales compute based on demand and can auto-pause the database, billing only for storage during idle periods. Ideal for variable workloads.

Reference

Hosting multiple small databases for different tenants (SaaS) with variable workloads. Need to share resources to reduce costs.

Use Azure SQL Database elastic pools.

Why: Elastic pools allow multiple databases to share a pre-allocated set of resources (DTUs or vCores), providing a cost-effective solution for multi-tenant applications.

A database is expected to grow beyond 4 TB (up to 100 TB) and requires rapid scaling and near-instant backups and restores, regardless of size.

Use the Hyperscale service tier for Azure SQL Database.

Why: Hyperscale uses a unique distributed architecture for very large databases (VLDBs), breaking the size limits of other tiers and providing constant-time database operations.

Deploying a managed PostgreSQL database for a microservices application, requiring zone-redundant high availability and independent scaling of compute and storage.

Use Azure Database for PostgreSQL - Flexible Server.

Why: Flexible Server is the recommended offering, providing zone-redundant HA, custom maintenance windows, and better cost optimization compared to the older Single Server model.

Protect sensitive data (e.g., credit card numbers) so it remains encrypted at rest, in transit, AND while in use (in memory) on the server. Even DBAs should not see plaintext data.

Use Always Encrypted.

Why: Always Encrypted is a client-side encryption technology where keys are held by the client, ensuring data is never decrypted on the server. TDE only protects data at rest.

Need to hide sensitive data from non-privileged users in query results (e.g., show only the last four digits of a social security number) without changing the stored data.

Use Dynamic Data Masking.

Why: DDM applies masking rules at query time based on user permissions. It is a security feature to limit data exposure, not an encryption feature.

Ensure business continuity for a group of Azure SQL Databases by enabling automatic failover to a secondary region in case of a regional outage.

Configure an auto-failover group.

Why: Auto-failover groups provide a unified listener endpoint that automatically redirects traffic after failover, simplifying application design for DR. It provides a lower RPO/RTO than restoring from geo-redundant backups.

Describe how to work with non-relational data on Azure

Need to store massive amounts of unstructured data, such as video files, images, backups, and logs, in a cost-effective manner.

Use Azure Blob Storage.

Why: Blob Storage is an object storage service optimized for storing petabytes of unstructured data. It is not suitable for structured query workloads.

Reference

Optimize storage costs for data with varying access patterns.

Use Azure Blob Storage access tiers: Hot (frequently accessed), Cool (infrequently accessed, >30 days), Archive (rarely accessed, >180 days).

Why: Tiers provide a cost trade-off: Hot has the highest storage cost but lowest access cost. Archive has the lowest storage cost but highest access cost and retrieval latency (hours).

Automatically move blobs between Hot, Cool, and Archive tiers based on their age or last access time to optimize costs.

Configure a lifecycle management policy on the storage account.

Why: This automates the tiering process, ensuring data is always on the most cost-effective tier without manual intervention.

Migrate an on-premises application that uses SMB file shares. Multiple VMs need to mount and access the same shared folder.

Use Azure File Storage.

Why: Azure Files provides fully managed file shares in the cloud accessible via the SMB and NFS protocols, making it a direct replacement for on-prem file servers.

Building a data lake for big data analytics that requires efficient directory-level operations and fine-grained, POSIX-like access control.

Use Azure Data Lake Storage Gen2.

Why: ADLS Gen2 builds on Blob Storage by adding a hierarchical namespace (for atomic directory operations) and support for POSIX-compliant ACLs, which are critical for performance and security in big data frameworks like Spark.

A global application requires single-digit millisecond read/write latency, automatic multi-region replication, and horizontal scaling for a NoSQL database.

Use Azure Cosmos DB.

Why: Cosmos DB is designed for globally distributed, mission-critical applications, providing turnkey global distribution, guaranteed low latency SLAs, and multiple consistency models.

Reference

Choosing a data model and API for a new Cosmos DB application.

Use API for NoSQL (document), MongoDB API (document), Apache Gremlin API (graph), Table API (key-value), or Apache Cassandra API (wide-column).

Why: Select the API that best fits your data model and existing application stack. Use NoSQL for new JSON-based apps, Gremlin for relationship-heavy data, and others for migrating existing workloads (MongoDB, Cassandra, Table Storage).

Balancing read consistency, availability, and performance for a Cosmos DB application.

Choose from five consistency levels: Strong, Bounded Staleness, Session (default), Consistent Prefix, Eventual.

Why: Strong provides the highest consistency but highest latency. Eventual provides the lowest latency but weakest consistency. Session is the most common, guaranteeing a user reads their own writes within their session.

A downstream service needs to react in near-real-time to any data created or updated in a Cosmos DB container (e.g., to update a search index).

Use the Cosmos DB change feed.

Why: The change feed provides a persistent, ordered log of changes. It is commonly consumed by an Azure Function to build event-driven architectures without polling the database.

Need to run complex analytical queries on operational Cosmos DB data without impacting the performance of the transactional workload (HTAP).

Enable the Azure Cosmos DB analytical store and use Azure Synapse Link.

Why: The analytical store is a fully isolated, auto-synced columnar representation of your transactional data. It allows analytical queries via Synapse without consuming transactional Request Units (RUs).

Storing large amounts of simple, structured non-relational data (e.g., device telemetry) for fast key-based lookups at a very low cost.

Use Azure Table Storage.

Why: Table Storage is a NoSQL key-value store optimized for high-volume, simple lookups with a PartitionKey and RowKey. It is significantly cheaper than Cosmos DB when low latency SLAs and global distribution are not required.

Need a simple, reliable messaging system to decouple application components, where messages are processed asynchronously.

Use Azure Queue Storage.

Why: Queue Storage provides a simple, cost-effective, and reliable message queue for basic asynchronous communication patterns.

Describe an analytics workload on Azure

Need to build, schedule, and monitor complex data integration workflows that move and transform data from various on-premises and cloud sources.

Use Azure Data Factory (ADF).

Why: ADF is a managed cloud orchestration service for building and managing ETL/ELT pipelines at scale, with extensive connectivity and monitoring capabilities.

Reference

An Azure Data Factory pipeline needs to access a data source located on-premises behind a corporate firewall.

Install a Self-hosted Integration Runtime (IR) on a machine within the on-premises network.

Why: The Self-hosted IR acts as a secure gateway, enabling ADF in the cloud to connect to and move data from on-premises sources without exposing them to the public internet.

Need a single, integrated platform for data warehousing (SQL), big data analytics (Spark), data exploration (serverless SQL), and data integration.

Use Azure Synapse Analytics.

Why: Synapse provides a unified workspace (Synapse Studio) that brings together these different analytical engines, reducing complexity and integration overhead.

Choosing a SQL query engine within Synapse Analytics.

Use the Serverless SQL pool for ad-hoc, exploratory queries on data in the data lake with a pay-per-query model. Use the Dedicated SQL pool for high-performance, predictable data warehousing workloads with provisioned resources.

Why: Serverless is for unpredictable exploration and discovery. Dedicated is for production BI and reporting with performance SLAs.

Need to process and analyze high-volume streaming data in real time from sources like IoT Hub or Event Hubs to power live dashboards or trigger alerts.

Use Azure Stream Analytics.

Why: Stream Analytics is a real-time event processing engine that uses a simple SQL-like query language to analyze data in motion with low latency.

A data science team needs a collaborative, notebook-based environment for large-scale data engineering and machine learning using Apache Spark.

Use Azure Databricks.

Why: Databricks provides an optimized Spark runtime, collaborative notebooks, and integrated ML capabilities (MLflow), making it the premier platform for advanced analytics and ML on Azure.

Need to ingest millions of events per second from sources like mobile apps, web telemetry, or IoT devices for real-time processing.

Use Azure Event Hubs.

Why: Event Hubs is a big data streaming platform designed for high-throughput event ingestion. It acts as the "front door" for streaming data, decoupling producers from consumers.

An organization wants a single, unified SaaS analytics platform that combines data engineering, data science, data warehousing, and BI with minimal infrastructure management.

Use Microsoft Fabric.

Why: Fabric provides an end-to-end, SaaS-based analytics experience built on a single data lake (OneLake). It simplifies the architecture and reduces integration overhead compared to building with separate PaaS services.

Reference

Within Microsoft Fabric, need a single artifact to store data in open Delta Lake format that can be accessed by both Spark engines (for data engineering) and SQL engines (for BI).

Use a Microsoft Fabric Lakehouse.

Why: The Lakehouse is the core architectural pattern in Fabric. It combines the scalability and flexibility of a data lake with the transactional guarantees and SQL querying capabilities of a data warehouse.

A Power BI report in Microsoft Fabric needs to query large volumes of data directly from OneLake with the performance of import mode but the data freshness of DirectQuery.

Use Direct Lake mode in Power BI.

Why: Direct Lake is a unique Fabric feature that loads Parquet/Delta files directly into the Power BI engine memory on demand, avoiding data duplication and query latency while providing near-real-time data access.

Business users need to connect to various data sources, create interactive dashboards and reports, and share insights across the organization.

Use Power BI.

Why: Power BI is Microsoft's business analytics service for building interactive data visualizations. Use Power BI Desktop for authoring and Power BI Service for sharing and collaboration.

Differentiating between a multi-page interactive analysis and a single-page, high-level overview in Power BI.

A Report is a multi-page collection of detailed, interactive visuals built from a single dataset. A Dashboard is a single canvas of tiles pinned from one or more reports, providing an at-a-glance view.

Why: Reports are for deep-dive analysis. Dashboards are for monitoring key metrics.

A single Power BI report must be shared with multiple users, but each user should only see the data relevant to them (e.g., a sales manager sees only their region's data).

Implement Row-Level Security (RLS).

Why: RLS defines filter rules based on user roles, enforcing data security at the data model level so users accessing the same report see different subsets of data.

Need to generate highly formatted, pixel-perfect reports (like invoices or financial statements) that are optimized for printing or PDF export.

Use Power BI Paginated Reports.

Why: Paginated reports are designed for print-ready layouts with precise control over headers, footers, and page breaks, unlike standard interactive Power BI reports which are for on-screen exploration.

A Power BI dataset containing billions of rows takes too long to refresh. Only the last few days of data change frequently.

Configure incremental refresh on the dataset.

Why: Incremental refresh partitions the data (usually by date) and only refreshes the most recent partitions, dramatically reducing refresh time and resource usage for large datasets.

A single Power BI report needs to combine pre-loaded, high-performance data (Import mode) with real-time data from an operational source (DirectQuery mode).

Use Power BI composite models.

Why: Composite models allow a single dataset to mix tables with different storage modes, providing the flexibility to balance performance and data freshness.

An organization needs to discover, classify, and catalog all data assets across their hybrid data estate to enable data governance and discovery.

Use Microsoft Purview.

Why: Purview is a unified data governance service that provides automated data scanning, a business glossary, data classification, and end-to-end data lineage visualization.

Describe core data concepts

Data is organized in a fixed tabular layout with a predefined schema (rows and columns), such as a product catalog or financial records.

Represent as Structured Data.

Why: Structured data conforms to a rigid schema, ideal for relational databases (OLTP). Contrast with semi-structured (JSON/XML) and unstructured (images/audio).

Data has some organizational structure (tags, keys) but lacks a rigid schema. Each record can have different fields, such as IoT sensor JSON documents.

Represent as Semi-structured Data (e.g., JSON, XML).

Why: JSON and XML are self-describing, offering flexibility over the fixed schemas of structured data. Ideal for NoSQL databases and data lakes.

Storing large files with no predefined schema or organizational structure, like MRI scans, videos, or audio recordings.

Represent as Unstructured Data.

Why: This data type cannot be stored in traditional row/column databases. Requires object storage like Azure Blob Storage.

Differentiate between workloads for day-to-day operations versus historical analysis.

Why: OLTP systems are normalized and optimized for fast writes. OLAP systems are denormalized (star schema) and optimized for fast reads and aggregations.

Choose a data integration pattern for a data warehouse.

Why: ELT is the modern cloud pattern, leveraging scalable compute in the target data store (data warehouse/lakehouse) and simplifying ingestion.

Assigning responsibility for data platform tasks.

Why: Clearly defined roles are essential. The key distinction is build (Engineer), manage (DBA), and analyze (Analyst).

Processing large volumes of data with different latency requirements.

Designing a schema for a data warehouse to support analytical queries.

Use a Star Schema consisting of a central fact table (containing numeric measures) connected to multiple dimension tables (containing descriptive attributes).

Choosing a central repository for analytics.

Why: Data lakes offer flexibility for data science and exploration of raw data. Data warehouses provide high performance and structure for business intelligence.

Describe how to work with relational data on Azure

Need a fully managed relational database for a new cloud-native application without managing underlying infrastructure.

Use Azure SQL Database.

Why: It is a PaaS offering with automatic patching, backups, and high availability. Ideal for standard SQL workloads where OS-level access is not needed.

Reference

Lift-and-shift migration of an on-premises SQL Server workload that uses instance-scoped features like SQL Server Agent, cross-database queries, or Service Broker.

Use Azure SQL Managed Instance.

Why: SQL MI provides near-100% compatibility with the on-premises SQL Server engine, minimizing migration changes. Azure SQL Database does not support these instance-level features.

Reference

Migrating a SQL Server database to Azure requiring full control over the OS, specific SQL Server versions, or features with limited PaaS support (e.g., certain CLR assemblies).

Use SQL Server on Azure Virtual Machines.

Why: This IaaS option provides maximum compatibility and control, but requires the user to manage the OS, patching, and backups, unlike PaaS offerings.

An application has intermittent, unpredictable usage patterns with long idle periods. Need to minimize costs during inactivity.

Use the Serverless compute tier for Azure SQL Database.

Why: Serverless automatically scales compute based on demand and can auto-pause the database, billing only for storage during idle periods. Ideal for variable workloads.

Reference

Hosting multiple small databases for different tenants (SaaS) with variable workloads. Need to share resources to reduce costs.

Use Azure SQL Database elastic pools.

Why: Elastic pools allow multiple databases to share a pre-allocated set of resources (DTUs or vCores), providing a cost-effective solution for multi-tenant applications.

A database is expected to grow beyond 4 TB (up to 100 TB) and requires rapid scaling and near-instant backups and restores, regardless of size.

Use the Hyperscale service tier for Azure SQL Database.

Why: Hyperscale uses a unique distributed architecture for very large databases (VLDBs), breaking the size limits of other tiers and providing constant-time database operations.

Deploying a managed PostgreSQL database for a microservices application, requiring zone-redundant high availability and independent scaling of compute and storage.

Use Azure Database for PostgreSQL - Flexible Server.

Why: Flexible Server is the recommended offering, providing zone-redundant HA, custom maintenance windows, and better cost optimization compared to the older Single Server model.

Protect sensitive data (e.g., credit card numbers) so it remains encrypted at rest, in transit, AND while in use (in memory) on the server. Even DBAs should not see plaintext data.

Use Always Encrypted.

Why: Always Encrypted is a client-side encryption technology where keys are held by the client, ensuring data is never decrypted on the server. TDE only protects data at rest.

Need to hide sensitive data from non-privileged users in query results (e.g., show only the last four digits of a social security number) without changing the stored data.

Use Dynamic Data Masking.

Why: DDM applies masking rules at query time based on user permissions. It is a security feature to limit data exposure, not an encryption feature.

Ensure business continuity for a group of Azure SQL Databases by enabling automatic failover to a secondary region in case of a regional outage.

Configure an auto-failover group.

Describe how to work with non-relational data on Azure

Need to store massive amounts of unstructured data, such as video files, images, backups, and logs, in a cost-effective manner.

Use Azure Blob Storage.

Why: Blob Storage is an object storage service optimized for storing petabytes of unstructured data. It is not suitable for structured query workloads.

Reference

Optimize storage costs for data with varying access patterns.

Use Azure Blob Storage access tiers: Hot (frequently accessed), Cool (infrequently accessed, >30 days), Archive (rarely accessed, >180 days).

Why: Tiers provide a cost trade-off: Hot has the highest storage cost but lowest access cost. Archive has the lowest storage cost but highest access cost and retrieval latency (hours).

Automatically move blobs between Hot, Cool, and Archive tiers based on their age or last access time to optimize costs.

Configure a lifecycle management policy on the storage account.

Why: This automates the tiering process, ensuring data is always on the most cost-effective tier without manual intervention.

Migrate an on-premises application that uses SMB file shares. Multiple VMs need to mount and access the same shared folder.

Use Azure File Storage.

Why: Azure Files provides fully managed file shares in the cloud accessible via the SMB and NFS protocols, making it a direct replacement for on-prem file servers.

Building a data lake for big data analytics that requires efficient directory-level operations and fine-grained, POSIX-like access control.

Use Azure Data Lake Storage Gen2.

A global application requires single-digit millisecond read/write latency, automatic multi-region replication, and horizontal scaling for a NoSQL database.

Use Azure Cosmos DB.

Why: Cosmos DB is designed for globally distributed, mission-critical applications, providing turnkey global distribution, guaranteed low latency SLAs, and multiple consistency models.

Reference

Choosing a data model and API for a new Cosmos DB application.

Use API for NoSQL (document), MongoDB API (document), Apache Gremlin API (graph), Table API (key-value), or Apache Cassandra API (wide-column).

Balancing read consistency, availability, and performance for a Cosmos DB application.

Choose from five consistency levels: Strong, Bounded Staleness, Session (default), Consistent Prefix, Eventual.

A downstream service needs to react in near-real-time to any data created or updated in a Cosmos DB container (e.g., to update a search index).

Use the Cosmos DB change feed.

Why: The change feed provides a persistent, ordered log of changes. It is commonly consumed by an Azure Function to build event-driven architectures without polling the database.

Need to run complex analytical queries on operational Cosmos DB data without impacting the performance of the transactional workload (HTAP).

Enable the Azure Cosmos DB analytical store and use Azure Synapse Link.

Storing large amounts of simple, structured non-relational data (e.g., device telemetry) for fast key-based lookups at a very low cost.

Use Azure Table Storage.

Need a simple, reliable messaging system to decouple application components, where messages are processed asynchronously.

Use Azure Queue Storage.

Why: Queue Storage provides a simple, cost-effective, and reliable message queue for basic asynchronous communication patterns.

Describe an analytics workload on Azure

Need to build, schedule, and monitor complex data integration workflows that move and transform data from various on-premises and cloud sources.

Use Azure Data Factory (ADF).

Why: ADF is a managed cloud orchestration service for building and managing ETL/ELT pipelines at scale, with extensive connectivity and monitoring capabilities.

Reference

An Azure Data Factory pipeline needs to access a data source located on-premises behind a corporate firewall.

Install a Self-hosted Integration Runtime (IR) on a machine within the on-premises network.

Why: The Self-hosted IR acts as a secure gateway, enabling ADF in the cloud to connect to and move data from on-premises sources without exposing them to the public internet.

Need a single, integrated platform for data warehousing (SQL), big data analytics (Spark), data exploration (serverless SQL), and data integration.

Use Azure Synapse Analytics.

Why: Synapse provides a unified workspace (Synapse Studio) that brings together these different analytical engines, reducing complexity and integration overhead.

Choosing a SQL query engine within Synapse Analytics.

Why: Serverless is for unpredictable exploration and discovery. Dedicated is for production BI and reporting with performance SLAs.

Need to process and analyze high-volume streaming data in real time from sources like IoT Hub or Event Hubs to power live dashboards or trigger alerts.

Use Azure Stream Analytics.

Why: Stream Analytics is a real-time event processing engine that uses a simple SQL-like query language to analyze data in motion with low latency.

A data science team needs a collaborative, notebook-based environment for large-scale data engineering and machine learning using Apache Spark.

Use Azure Databricks.

Why: Databricks provides an optimized Spark runtime, collaborative notebooks, and integrated ML capabilities (MLflow), making it the premier platform for advanced analytics and ML on Azure.

Need to ingest millions of events per second from sources like mobile apps, web telemetry, or IoT devices for real-time processing.

Use Azure Event Hubs.

Why: Event Hubs is a big data streaming platform designed for high-throughput event ingestion. It acts as the "front door" for streaming data, decoupling producers from consumers.

An organization wants a single, unified SaaS analytics platform that combines data engineering, data science, data warehousing, and BI with minimal infrastructure management.

Use Microsoft Fabric.

Reference

Within Microsoft Fabric, need a single artifact to store data in open Delta Lake format that can be accessed by both Spark engines (for data engineering) and SQL engines (for BI).

Use a Microsoft Fabric Lakehouse.

A Power BI report in Microsoft Fabric needs to query large volumes of data directly from OneLake with the performance of import mode but the data freshness of DirectQuery.

Use Direct Lake mode in Power BI.

Business users need to connect to various data sources, create interactive dashboards and reports, and share insights across the organization.

Use Power BI.

Why: Power BI is Microsoft's business analytics service for building interactive data visualizations. Use Power BI Desktop for authoring and Power BI Service for sharing and collaboration.

Differentiating between a multi-page interactive analysis and a single-page, high-level overview in Power BI.

Why: Reports are for deep-dive analysis. Dashboards are for monitoring key metrics.

A single Power BI report must be shared with multiple users, but each user should only see the data relevant to them (e.g., a sales manager sees only their region's data).

Implement Row-Level Security (RLS).

Why: RLS defines filter rules based on user roles, enforcing data security at the data model level so users accessing the same report see different subsets of data.

Need to generate highly formatted, pixel-perfect reports (like invoices or financial statements) that are optimized for printing or PDF export.

Use Power BI Paginated Reports.

A Power BI dataset containing billions of rows takes too long to refresh. Only the last few days of data change frequently.

Configure incremental refresh on the dataset.

Why: Incremental refresh partitions the data (usually by date) and only refreshes the most recent partitions, dramatically reducing refresh time and resource usage for large datasets.

A single Power BI report needs to combine pre-loaded, high-performance data (Import mode) with real-time data from an operational source (DirectQuery mode).

Use Power BI composite models.

Why: Composite models allow a single dataset to mix tables with different storage modes, providing the flexibility to balance performance and data freshness.

An organization needs to discover, classify, and catalog all data assets across their hybrid data estate to enable data governance and discovery.

Use Microsoft Purview.

Why: Purview is a unified data governance service that provides automated data scanning, a business glossary, data classification, and end-to-end data lineage visualization.