Need to build, schedule, and monitor complex data integration workflows that move and transform data from various on-premises and cloud sources.
→Use Azure Data Factory (ADF).
Why: ADF is a managed cloud orchestration service for building and managing ETL/ELT pipelines at scale, with extensive connectivity and monitoring capabilities.
Reference↗
An Azure Data Factory pipeline needs to access a data source located on-premises behind a corporate firewall.
→Install a Self-hosted Integration Runtime (IR) on a machine within the on-premises network.
Why: The Self-hosted IR acts as a secure gateway, enabling ADF in the cloud to connect to and move data from on-premises sources without exposing them to the public internet.
Need a single, integrated platform for data warehousing (SQL), big data analytics (Spark), data exploration (serverless SQL), and data integration.
→Use Azure Synapse Analytics.
Why: Synapse provides a unified workspace (Synapse Studio) that brings together these different analytical engines, reducing complexity and integration overhead.
Choosing a SQL query engine within Synapse Analytics.
→Use the Serverless SQL pool for ad-hoc, exploratory queries on data in the data lake with a pay-per-query model. Use the Dedicated SQL pool for high-performance, predictable data warehousing workloads with provisioned resources.
Why: Serverless is for unpredictable exploration and discovery. Dedicated is for production BI and reporting with performance SLAs.
Need to process and analyze high-volume streaming data in real time from sources like IoT Hub or Event Hubs to power live dashboards or trigger alerts.
→Use Azure Stream Analytics.
Why: Stream Analytics is a real-time event processing engine that uses a simple SQL-like query language to analyze data in motion with low latency.
A data science team needs a collaborative, notebook-based environment for large-scale data engineering and machine learning using Apache Spark.
→Use Azure Databricks.
Why: Databricks provides an optimized Spark runtime, collaborative notebooks, and integrated ML capabilities (MLflow), making it the premier platform for advanced analytics and ML on Azure.
Need to ingest millions of events per second from sources like mobile apps, web telemetry, or IoT devices for real-time processing.
→Use Azure Event Hubs.
Why: Event Hubs is a big data streaming platform designed for high-throughput event ingestion. It acts as the "front door" for streaming data, decoupling producers from consumers.
An organization wants a single, unified SaaS analytics platform that combines data engineering, data science, data warehousing, and BI with minimal infrastructure management.
→Use Microsoft Fabric.
Why: Fabric provides an end-to-end, SaaS-based analytics experience built on a single data lake (OneLake). It simplifies the architecture and reduces integration overhead compared to building with separate PaaS services.
Reference↗
Within Microsoft Fabric, need a single artifact to store data in open Delta Lake format that can be accessed by both Spark engines (for data engineering) and SQL engines (for BI).
→Use a Microsoft Fabric Lakehouse.
Why: The Lakehouse is the core architectural pattern in Fabric. It combines the scalability and flexibility of a data lake with the transactional guarantees and SQL querying capabilities of a data warehouse.
A Power BI report in Microsoft Fabric needs to query large volumes of data directly from OneLake with the performance of import mode but the data freshness of DirectQuery.
→Use Direct Lake mode in Power BI.
Why: Direct Lake is a unique Fabric feature that loads Parquet/Delta files directly into the Power BI engine memory on demand, avoiding data duplication and query latency while providing near-real-time data access.
Business users need to connect to various data sources, create interactive dashboards and reports, and share insights across the organization.
→Use Power BI.
Why: Power BI is Microsoft's business analytics service for building interactive data visualizations. Use Power BI Desktop for authoring and Power BI Service for sharing and collaboration.
Differentiating between a multi-page interactive analysis and a single-page, high-level overview in Power BI.
→A Report is a multi-page collection of detailed, interactive visuals built from a single dataset. A Dashboard is a single canvas of tiles pinned from one or more reports, providing an at-a-glance view.
Why: Reports are for deep-dive analysis. Dashboards are for monitoring key metrics.
A single Power BI report must be shared with multiple users, but each user should only see the data relevant to them (e.g., a sales manager sees only their region's data).
→Implement Row-Level Security (RLS).
Why: RLS defines filter rules based on user roles, enforcing data security at the data model level so users accessing the same report see different subsets of data.
Need to generate highly formatted, pixel-perfect reports (like invoices or financial statements) that are optimized for printing or PDF export.
→Use Power BI Paginated Reports.
Why: Paginated reports are designed for print-ready layouts with precise control over headers, footers, and page breaks, unlike standard interactive Power BI reports which are for on-screen exploration.
A Power BI dataset containing billions of rows takes too long to refresh. Only the last few days of data change frequently.
→Configure incremental refresh on the dataset.
Why: Incremental refresh partitions the data (usually by date) and only refreshes the most recent partitions, dramatically reducing refresh time and resource usage for large datasets.
A single Power BI report needs to combine pre-loaded, high-performance data (Import mode) with real-time data from an operational source (DirectQuery mode).
→Use Power BI composite models.
Why: Composite models allow a single dataset to mix tables with different storage modes, providing the flexibility to balance performance and data freshness.
An organization needs to discover, classify, and catalog all data assets across their hybrid data estate to enable data governance and discovery.
→Use Microsoft Purview.
Why: Purview is a unified data governance service that provides automated data scanning, a business glossary, data classification, and end-to-end data lineage visualization.