Playbook

Microsoft Fabric Analytics Engineer Associate

Last reviewed: May 2026

A scannable reference of architectural patterns the DP-600 exam tests. Read top-to-bottom, or jump to a section.

Implement and manage semantic models

Querying a massive (500M+ rows) Delta table in a Fabric lakehouse with optimal performance and near real-time data access.

Use a semantic model in Direct Lake mode.

Why: Direct Lake reads Parquet files directly from OneLake, bypassing data import or query translation. It provides import-like performance without data duplication or refresh latency. DirectQuery is slower; Import mode introduces latency.

Applying common time-intelligence calculations (YTD, QTD, MTD) to dozens of base measures (Sales, Profit, Quantity) without creating hundreds of DAX measures.

Implement a calculation group with calculation items for YTD, QTD, and MTD.

Why: Calculation groups eliminate measure proliferation. They define a set of generic calculations that can be dynamically applied to any selected measure, drastically simplifying model maintenance.

Multiple semantic models in a workspace need to share common dimension tables (e.g., Date, Customer) to ensure consistency and reduce data duplication.

Create a "core" semantic model containing the shared dimensions. Build other "composite" models that connect to the core model via DirectQuery and to fact tables via Direct Lake/Import.

Why: This "hub and spoke" architecture promotes a single source of truth for dimensions. Composite models allow combining data from different sources and storage modes into a unified model.

A fact table has multiple date columns (e.g., OrderDate, ShipDate) that must all relate to a single Date dimension table.

Create one active relationship and multiple inactive relationships between the fact and date tables. Use the `USERELATIONSHIP()` DAX function in measures to activate the appropriate inactive relationship.

Why: Power BI allows only one active relationship between two tables. This pattern enables analysis by different date roles without duplicating the dimension table.

A semantic model with a large fact table (billions of rows) takes too long to refresh. Only the last 30 days of data change frequently.

Configure incremental refresh on the fact table. Set parameters `RangeStart` and `RangeEnd`. Define a policy to archive old data (e.g., store last 5 years) and refresh recent data (e.g., refresh last 30 days).

Why: This dramatically reduces refresh time and resource consumption by only processing partitions containing new or changed data, rather than reloading the entire table.

A complex DAX measure is slow because it repeatedly calculates the same intermediate value within its formula.

Use variables (`VAR`) to store the result of the intermediate calculation once, then reference the variable multiple times in the `RETURN` statement.

Why: Variables prevent the engine from re-evaluating the same logic multiple times within a single measure execution, which significantly improves performance, especially in iterative contexts.

Creating a measure to calculate the contribution percentage of a value (e.g., product sales) to a larger total (e.g., all product sales), while respecting other filters (like date).

Use `DIVIDE([Sales], CALCULATE([Sales], ALLEXCEPT(Product, Product[Category])))` for percent of category or `CALCULATE([Sales], ALL(Product))` for percent of grand total.

Why: `CALCULATE` combined with `ALL`, `ALLEXCEPT`, or `REMOVEFILTERS` allows you to modify the filter context to get the correct denominator for the percentage calculation.

A report needs a slicer that allows users to choose which metric (e.g., "Revenue", "Cost", "Profit") a visual should display.

Create a disconnected table with the metric names. Create a single DAX measure using `SWITCH(SELECTEDVALUE(MetricTable[Metric]), "Revenue", [Total Revenue], "Cost", [Total Cost], ...)`.

Why: This pattern, often using a Field Parameter, provides a dynamic and user-friendly way to switch calculations without needing bookmarks or multiple visuals, making reports more interactive and concise.

An enterprise BI team needs to use professional tools (like Visual Studio, Tabular Editor, SQL Profiler) to manage, deploy, and troubleshoot a Fabric semantic model.

Enable the XMLA Read/Write endpoint for the workspace.

Why: The XMLA endpoint exposes the semantic model as a standard Analysis Services instance, enabling connectivity from a wide ecosystem of advanced BI and ALM tools for programmatic access and complex modeling tasks.

A Direct Lake model is performing slowly. Investigation reveals it is falling back to DirectQuery mode.

Use DAX Studio or Performance Analyzer to identify the query causing the fallback. Common causes include unsupported DAX functions, complex RLS, or an unoptimized/out-of-date lakehouse.

Why: Direct Lake has limitations. When a query uses an unsupported feature, it silently falls back to the slower DirectQuery engine. Identifying and fixing the root cause (e.g., optimizing DAX, running OPTIMIZE on the Delta table) is key to restoring performance.

A model has a many-to-many relationship (e.g., Sales and Promotions via a bridge table). Measures are returning incorrect totals when filtering by the "many" side.

Ensure the cross-filter direction on the relationships (Dimension -> Bridge -> Fact) is set correctly (typically single-direction). Use DAX functions like `TREATAS` or `INTERSECT` for more complex M2M calculations if needed.

Why: Incorrect cross-filter direction is a common cause of incorrect results in M2M models. While bidirectional filtering can seem to work, it often leads to ambiguity and double-counting. A well-defined model with explicit DAX patterns is more robust.

A composite model using DirectQuery against a massive fact table is slow. Most user queries are at an aggregated level (e.g., monthly sales by category).

Create a user-defined aggregation table in Import mode. The aggregation table should contain pre-summarized data at the grain of common queries (Month, Category).

Why: The query engine will automatically redirect queries to the smaller, in-memory aggregation table when possible, providing massive performance gains. It will only hit the DirectQuery source for queries that require a lower level of detail.

Calculating complex running totals or moving averages in DAX that are performing poorly with traditional filter-based approaches.

Use DAX window functions like `WINDOW` or `OFFSET`.

Why: These functions are specifically optimized for positional calculations over a sorted set of rows. They are often more performant and syntactically simpler than older patterns that rely on heavy filtering and context transitions.

Calculating Year-to-Date (YTD) totals for a company with a fiscal year starting on July 1st.

Use the `TOTALYTD` or `DATESYTD` functions with the optional `YearEndDate` parameter. Example: `TOTALYTD([Sales], 'Date'[Date], "6/30")`.

Why: Specifying the year-end date parameter is the correct and simplest way to make DAX time intelligence functions aware of the custom fiscal calendar.

Plan, implement, and manage a solution for data analytics

Promoting a semantic model across Dev, Test, and Prod stages where each stage has a different database connection string.

Use Fabric deployment pipelines with deployment rules.

Why: Deployment rules automate the modification of data source connections, parameters, and other settings for each environment. This avoids manual, error-prone changes post-deployment.

Reference

Implementing a decentralized data mesh architecture where business domains own and manage their own data products.

Create domain-specific workspaces. Use OneLake shortcuts to enable cross-domain data sharing and consumption without centralizing data ownership.

Why: This pattern aligns with data mesh principles of domain ownership and data-as-a-product. Workspaces provide the boundary for ownership, while shortcuts provide the interoperability layer.

A team of developers needs to collaborate on Fabric items (semantic models, reports, notebooks) with source control and version history.

Configure Git integration for the Fabric workspace, connecting it to an Azure DevOps or GitHub repository.

Why: Git integration stores Fabric item definitions as text files (JSON, TMDL), enabling standard DevOps practices like branching, pull requests, and version tracking. This is essential for enterprise-grade Application Lifecycle Management (ALM).

Before changing a lakehouse table, an engineer must identify all downstream reports and semantic models that will be affected.

Use the Lineage View and select "Impact analysis" on the lakehouse item.

Why: This feature provides a complete, automated view of all dependencies. It is a critical governance tool for managing change in a complex analytics environment, preventing unexpected breakages.

A team needs to version control a semantic model in a text-based, human-readable format that is easy to diff and merge.

Save the Power BI file as a Power BI Project (.pbip). This stores the model definition in the Tabular Model Definition Language (TMDL) format.

Why: TMDL is a developer-friendly format that represents the model as a folder structure with individual text files for tables, measures, etc. This is far superior to the binary .bim file for Git-based collaboration and CI/CD.

Prepare and serve data

Implementing a medallion architecture (Bronze, Silver, Gold) and needing to access data across layers without physical data duplication.

Use OneLake shortcuts to reference data in other lakehouses or layers.

Why: Shortcuts are symbolic links in OneLake. They provide a unified namespace and allow access to data without copying, which is ideal for a logical data mesh or medallion architecture.

Reference

Migrating an existing T-SQL-heavy analytics workload from Azure Synapse to Fabric.

Use a Fabric Data Warehouse.

Why: The Fabric Warehouse offers full T-SQL compatibility, making it the ideal target for migrating existing SQL scripts, stored procedures, and analyst queries with minimal changes. The Lakehouse SQL endpoint has read-only T-SQL access and uses Spark SQL for writes.

Ingesting and querying high-volume, high-velocity streaming data (e.g., IoT telemetry) with sub-second latency.

Use Fabric Eventstream for ingestion and a KQL Database for storage and analysis.

Why: This is the purpose-built streaming analytics stack in Fabric. KQL (Kusto Query Language) is optimized for time-series analysis on streaming data, offering much lower latency than batch-oriented lakehouses or warehouses.

Implementing Slowly Changing Dimension (SCD) Type 2 to maintain a full history of dimension changes in a lakehouse.

Use a `MERGE INTO` statement in a Spark notebook or pipeline. Match on the business key; `WHEN MATCHED` updates the old record (sets `IsCurrent` to false, `EndDate` to now); `WHEN NOT MATCHED` inserts the new record.

Why: Delta Lake's `MERGE` operation provides atomic upsert capabilities, making it the standard and most efficient way to implement SCD logic in a Fabric lakehouse.

Replicating data in near real-time from an operational database (e.g., Azure SQL DB) to a Fabric lakehouse for analytics.

Use Fabric Mirroring.

Why: Mirroring is a low-latency, low-impact change data capture (CDC) solution built into Fabric. It automatically replicates data and schema changes to OneLake as Delta tables, eliminating the need for complex ETL pipelines.

Ingesting and transforming complex, nested JSON data from an API into a flattened, structured Delta table.

Use a PySpark notebook. Use functions like `from_json` to parse the schema, and `explode` to flatten arrays into rows.

Why: PySpark provides the most powerful and flexible tools for handling complex and evolving JSON structures programmatically, far beyond the capabilities of a standard copy activity.

Ingesting data into Fabric from an on-premises SQL Server database that is behind a corporate firewall.

Install and configure an on-premises data gateway on a server within the local network. Add the gateway as a data source in Fabric.

Why: The gateway acts as a secure bridge, relaying queries and data between Fabric cloud services and on-premises data sources without requiring inbound firewall ports to be opened.

Query performance on a large, frequently updated Delta table has degraded due to an accumulation of many small data files.

Run the `OPTIMIZE` command to compact small files into larger ones. Optionally use `ZORDER BY` on frequently filtered columns to co-locate related data.

Why: Fewer, larger files are significantly more efficient for Spark to read. Z-ordering improves data skipping, allowing queries to read even less data. This is a critical maintenance task for Delta tables.

Aggregating streaming time-series data into fixed, non-overlapping time intervals (e.g., average temperature per sensor every 5 minutes).

Use a KQL query with the `summarize` operator and the `bin()` function. Example: `SensorData | summarize avg(temperature) by sensor_id, bin(timestamp, 5m)`.

Why: The `bin()` function is the standard, highly optimized way in KQL to group events into fixed time buckets (tumbling windows) for aggregation.

A Dataflow Gen2 refresh is slow. The data source is a relational database like Azure SQL.

Review the transformation steps in the Power Query editor to ensure query folding is active. Reorder or modify steps to maximize folding.

Why: Query folding pushes transformation logic back to the source database to be executed as a single native query. This is vastly more efficient than pulling all raw data into the dataflow engine and transforming it in memory.

A Spark notebook is performing a slow join between a very large fact table (billions of rows) and a small dimension table (thousands of rows).

Use a broadcast join by providing a hint (`spark.sql.functions.broadcast`) or letting the optimizer choose based on statistics.

Why: Broadcasting sends the entire small table to every executor node. This avoids a costly "shuffle" operation where the large table's data must be repartitioned and sent across the network, dramatically improving performance.

A data pipeline orchestrates multiple activities. One activity might fail, but subsequent, independent activities should still run, and the overall failure should be logged.

Configure activity dependencies. Activities that should run regardless of outcome should depend on the previous activity with the "Completion" condition.

Why: This allows for building robust, parallel execution paths. You can create separate branches for "Succeeded" and "Failed" conditions to implement custom logging or notification logic.

A pipeline to incrementally load data from a source with a `last_modified` timestamp.

Implement a watermark pattern. Store the `max(last_modified)` from the last successful run. In the next run, query the source for records where `last_modified` is greater than the stored watermark.

Why: This is the most efficient pattern for incremental loads from sources that provide a modification timestamp, ensuring only new or updated data is processed, minimizing data transfer and compute.

Analyze a real-time stream of IoT data to detect unusual spikes or dips in sensor readings.

Use the `series_decompose_anomalies()` function in a KQL query within an Eventhouse/KQL Database.

Why: This built-in KQL function is specifically designed for time-series anomaly detection. It automatically decomposes the series into seasonal, trend, and residual components to identify statistically significant outliers, requiring minimal manual configuration.

Need to join data from a Warehouse, a Lakehouse, and a mirrored Azure SQL Database in a single T-SQL query without moving data.

Use three-part naming conventions (`database.schema.table`) in a query run from the Warehouse or Lakehouse SQL endpoint. Use shortcuts to reference the mirrored database.

Why: Fabric provides a unified query engine that can access data across different Fabric items within the same workspace using a single SQL statement, enabling data virtualization.

A dataflow needs to process a file where some rows may be invalid. The entire flow should not fail; valid rows should be loaded, and invalid rows should be logged.

In Power Query, add a step to validate rows and create a "IsValid" column. Then, create two reference queries from that point: one that filters for `IsValid = true` to load to the destination, and another that filters for `IsValid = false` to load to an error log.

Why: This pattern provides robust error handling by splitting the data stream. It prevents a few bad rows from halting the entire process and provides a clear mechanism for auditing data quality issues.

Explore and analyze data

Implementing row-level security (RLS) where users should only see data corresponding to their identity (e.g., a sales manager sees only their stores).

Create a security table mapping users to data entities. In the RLS role, use a DAX filter expression like `[ManagerEmail] = USERPRINCIPALNAME()`.

Why: Dynamic RLS is scalable. It uses a data-driven approach instead of creating a static role for each person or entity. `USERPRINCIPALNAME()` correctly resolves the Azure AD identity.

Hiding sensitive columns or entire tables (e.g., Salary) from a specific group of users while allowing them to access the rest of the semantic model.

Define security roles and configure Object-Level Security (OLS) using an external tool like Tabular Editor to set table/column permissions to "None".

Why: OLS provides granular control over the visibility of model metadata. Unlike RLS which filters rows, OLS hides the entire object. It must be configured via the XMLA endpoint.

Users are reporting slow performance and throttling in Fabric. The administrator needs to identify the root cause.

Use the Fabric Capacity Metrics app.

Why: This app provides detailed insights into capacity unit (CU) consumption, throttling events, and resource usage by workload type (e.g., semantic model query, dataflow refresh). It is the primary tool for performance monitoring and capacity planning.

Enforce a data classification policy where reports and dashboards automatically inherit the sensitivity label of the semantic model they connect to.

Enable the tenant setting for downstream inheritance of sensitivity labels.

Why: This automates data governance, ensuring that protections applied to the data source (e.g., "Highly Confidential") are consistently enforced on all downstream content, reducing the risk of data leakage.

In a Fabric Warehouse, general users should see masked PII data (e.g., `XXX-XX-1234`), while privileged users see the full, unmasked data.

Apply Dynamic Data Masking (DDM) on the sensitive columns in the Warehouse. Grant `UNMASK` permissions to the privileged user roles.

Why: DDM is a security feature at the database level that redacts data on-the-fly based on user permissions. It protects sensitive data in-place without requiring separate views or copies of the data.