Hilfe Warenkorb Konto Anmelden
 
 
   Schnellsuche   
     zur Expertensuche                      
Beginning Azure Synapse Analytics - Transition from Data Warehouse to Data Lakehouse  
Beginning Azure Synapse Analytics - Transition from Data Warehouse to Data Lakehouse
von: Bhadresh Shiyal
Apress, 2021
ISBN: 9781484270615
263 Seiten, Download: 6331 KB
 
Format:  PDF
geeignet für: Apple iPad, Android Tablet PC's Online-Lesen PC, MAC, Laptop

Typ: A (einfacher Zugriff)

 

 
eBook anfordern
Inhaltsverzeichnis

  Table of Contents 5  
  About the Author 14  
  About the Technical Reviewer 15  
  Acknowledgments 16  
  Introduction 17  
  Chapter 1: Core Data and Analytics Concepts 19  
     Core Data Concepts 19  
        What Is Data? 20  
        Structured Data 20  
        Semi-structured Data 21  
        Unstructured Data 21  
     Data Processing Methods 22  
        Batch Data Processing 22  
        Streaming or Real-Time Data Processing 23  
     Relational Data and Its Characteristics 24  
     Non-Relational Data and Its Characteristics 26  
     Core Data Analytics Concepts 28  
        What Is Data Analytics? 28  
           Data Ingestion 28  
        Data Exploration 29  
        Data Processing 30  
        ETL 30  
        ELT 31  
        ELT / ETL Tools 32  
        Data Visualization 32  
        Data Analytics Categories 33  
           Descriptive Analytics 34  
           Diagnostic Analytics 34  
           Predictive Analytics 35  
           Prescriptive Analytics 35  
           Cognitive Analytics 36  
     Summary 36  
  Chapter 2: Modern Data Warehouses and Data Lakehouses 38  
     What Is a Data Warehouse? 39  
     Core Data Warehouse Concepts 40  
        Data Model 40  
        Model Types 41  
        Schema Types 41  
        Metadata 42  
     Why Do We Need a Data Warehouse? 42  
        Efficient Decision-Making 42  
        Separation of Concerns 42  
        Single Version of the Truth 43  
        Data Restructuring 43  
        Self-Service BI 43  
        Historical Data 44  
        Security 44  
        Data Quality 44  
        Data Mining 45  
        More Revenues 45  
     What Is a Modern Data Warehouse? 45  
     Difference Between Traditional & Modern Data Warehouses 46  
        Cloud vs. On-Premises 46  
        Separation of Compute and Storage Resources 46  
        Cost 47  
        Scalability 47  
        ETL vs. ELT 48  
        Disaster Recovery 48  
        Overall Architecture 48  
     Data Lakehouse 49  
        What Is a Data Lake? 49  
        What Is Delta Lake? 50  
        What Is Apache Spark? 51  
        What Is a Data Lakehouse? 52  
           Characteristics of a Data Lakehouse 53  
              Various Data Types 53  
              AI 53  
              Decoupled Compute and Storage Resources 54  
              Open Source Storage Format 54  
              Data Analytics and BI Tools 54  
              ACID Properties 54  
           Differences Between a Data Warehouse and a Data Lakehouse 55  
              Architecture 55  
              Access to Raw Data 55  
              Open Source vs. Proprietary 56  
              Workloads 56  
              Query Engines 56  
              Data Processing 57  
              Real-Time Data 57  
        Examples of Data Lakehouses 58  
           Azure Synapse Analytics 58  
           Databricks 59  
        Benefits of Data Lakehouse 60  
           Support for All Types of Data 60  
           Time to Market 61  
           More Cost Effective 61  
           AI 61  
           Reduction in ETL/ELT Jobs 62  
           Usage of Open Source Tools and Technologies 62  
           Efficient and Easy Data Governance 62  
        Drawbacks of Data Lakehouse 63  
           Monolithic Architecture 63  
           Technical Infancy 63  
           Migration Cost 64  
           Lack of Many Products/Options 64  
           Scarcity of Skilled Technical Resources 64  
     Summary 65  
  Chapter 3: Introduction to Azure Synapse Analytics 66  
     What Is Azure Synapse Analytics? 66  
     Azure Synapse Analytics vs. Azure SQL Data Warehouse 68  
     Why Should You Learn Azure Synapse Analytics? 69  
     Main Features of Azure Synapse Analytics 70  
        Unified Data Analytics Experience 70  
        Powerful Data Insights 71  
        Unlimited Scale 72  
        Security, Privacy, and Compliance 72  
        HTAP 73  
     Key Service Capabilities of Azure Synapse Analytics 73  
        Data Lake Exploration 74  
        Multiple Language Support 75  
        Deeply Integrated Apache Spark 76  
        Serverless Synapse SQL Pool 77  
        Hybrid Data Integration 78  
        Power BI Integration 79  
        AI Integration 80  
        Enterprise Data Warehousing 81  
        Seamless Streaming Analytics 82  
        Workload Management 82  
        Advanced Security 84  
     Summary 85  
  Chapter 4: Architecture and Its Main Components 86  
     High-Level Architecture 87  
     Main Components of Architecture 90  
        Synapse SQL 90  
           Compute Layer 90  
           Dedicated Synapse SQL Pool 90  
           Serverless Synapse SQL Pool 91  
           Storage Layer 93  
        Synapse Spark or Apache Spark 94  
        Synapse Pipelines 96  
        Synapse Studio 98  
        Synapse Link 100  
     Summary 102  
  Chapter 5: Synapse SQL 104  
     Synapse SQL Architecture Components 105  
        Massively Parallel Processing Engine 106  
        Distributed Query Processing Engine 107  
        Control Node 107  
        Compute Nodes 108  
        Data Movement Service 109  
        Distribution 109  
           Hash Distribution 111  
           Round-Robin Distribution 112  
           Replication-based Distribution 112  
        Azure Storage 114  
     Dedicated or Provisioned Synapse SQL Pool 114  
     Serverless or On-Demand Synapse SQL Pool 116  
     Synapse SQL Feature Comparison 117  
        Database Object Types 117  
        Query Language 119  
        Security 120  
        Tools 123  
        Storage Options 124  
        Data Formats 125  
     Resource Consumption Model for Synapse SQL 125  
     Synapse SQL Best Practices 126  
        Best Practices for Serverless Synapse SQL Pool 127  
        Best Practices for Dedicated Synapse SQL Pool 128  
     How-To’s 129  
        Create a Dedicated Synapse SQL Pool 129  
        Create a Serverless or On-Demand Synapse SQL Pool 132  
        Load Data Using COPY Statement in Dedicated Synapse SQL Pool 132  
        Ingest Data into Azure Data Lake Storage Gen2 133  
     Summary 134  
  Chapter 6: Synapse Spark 136  
     What Is Apache Spark? 137  
     What Is Synapse Spark in Azure Synapse Analytics? 139  
     Synapse Spark Features & Capabilities 140  
        Speed 140  
     Faster Start Time 140  
     Ease of Creation 140  
     Ease of Use 141  
     Security 141  
     Automatic Scalability 141  
     Separation of Concerns 142  
     Multiple Language Support 142  
     Integration with IDEs 142  
     Pre-loaded Libraries 143  
     REST APIs 143  
     Delta Lake and Its Importance in Synapse Spark 144  
     Synapse Spark Job Optimization 145  
     Data Format 145  
     Memory Management 146  
     Data Serialization 146  
     Data Caching 147  
     Data Abstraction 147  
     Join and Shuffle Optimization 148  
     Bucketing 149  
     Hyperspace Indexing 149  
     Synapse Spark Machine Learning 149  
        Data Preparation and Exploration 150  
        Build Machine Learning Models 150  
        Train Machine Learning Models 150  
        Model Deployment and Scoring 151  
     How-To’s 151  
        How to Create a Synapse Spark Pool 151  
        How to Create and Submit Apache Spark Job Definition in Synapse Studio Using Python 157  
        How to Monitor Synapse Spark Pools Using Synapse Studio 163  
     Summary 166  
  Chapter 7: Synapse Pipelines 168  
     Overview of Azure Data Factory 169  
     Overview of Synapse Pipelines 171  
     Activities 172  
     Pipelines 173  
     Linked Services 173  
     Dataset 174  
     Integration Runtimes (IR) 175  
     Azure Integration Runtime (Azure IR) 175  
     Self-Hosted Integration Runtimes (SHIR) 176  
     Azure SSIS Integration Runtimes (Azure SSIS IR) 177  
     Control Flow 177  
     Parameters 178  
     Data Flow 178  
     Data Movement Activities 178  
     Category: Azure 179  
     Category: Database 180  
        Category: NoSQL 181  
        Category: File 181  
        Category: Generic 182  
        Category: Services and Applications 182  
     Data Transformation Activities 184  
     Control Flow Activities 185  
     Copy Pipeline Example 186  
     Transformation Pipeline Example 188  
     Pipeline Triggers 189  
     Summary 190  
  Chapter 8: Synapse Workspace and Studio 192  
     What Is a Synapse Analytics Workspace? 193  
     Synapse Analytics Workspace Components and Features 194  
     Azure Data Lake Storage Gen2 Account and File System 194  
     Serverless Synapse SQL Pool 195  
     Shared Metadata Management 195  
     Code Artifacts 196  
     What Is Synapse Studio? 197  
     Main Features of Synapse Studio 199  
        Home Hub 199  
        Data Hub 199  
        Develop Hub 200  
        Integrate Hub 201  
        Monitor Hub 202  
        Integration 203  
        Activities 204  
        Manage Hub 204  
           Analytics Pools 204  
     External Connections 205  
     Integration 205  
     Security 206  
     Synapse Studio Capabilities 206  
        Data Preparation 206  
        Data Management 207  
        Data Exploration 207  
        Data Warehousing 207  
        Data Visualization 208  
     Machine Learning 208  
     Power BI in Synapse Studio 209  
     How-To’s 210  
     How to Create or Provision a New Azure Synapse Analytics Workspace Using Azure Portal 210  
     How to Launch Azure Synapse Studio 212  
     How to Link Power BI with Azure Synapse Studio 213  
     Summary 215  
  Chapter 9: Synapse Link 217  
     OLTP vs. OLAP 218  
     What Is HTAP? 219  
     Benefits of HTAP 219  
        No-ETL Analytics 219  
        Instant Insights 220  
        Reduced Data Duplication 220  
        Simplified Technical Architecture 220  
     What Is Azure Synapse Link? 221  
        Azure Cosmos DB 222  
     Azure Cosmos DB Analytical Store 222  
        Columnar Storage 224  
        Decoupling of Operational Store 224  
        Automatic Data Synchronization 225  
        SQL API and MongoDB API 225  
        Analytical TTL 225  
        Automatic Schema Updates 226  
        Cost-Effective Archiving 226  
        Scalability 227  
     When to Use Azure Synapse Link for Cosmos DB 227  
     Azure Synapse Link Limitations 228  
     Azure Synapse Link Use Cases 229  
        Industrial IOT 230  
           Predictive Maintenance Pipeline 231  
           Operational Reporting 231  
           Real-Time Applications 232  
        Real-Time Personalization for E-Commerce Users 232  
     How-To’s 233  
        How to Enable Azure Synapse Link for Azure Cosmos DB 233  
        How to Create an Azure Cosmos DB Container with Analytical Store Using Azure Portal 235  
        How to Connect to Azure Synapse Link for Azure Cosmos DB Using Azure Portal 236  
     Summary 237  
  Chapter 10: Azure Synapse Analytics Use Cases and Reference Architecture 240  
     Where Should You Use Azure Synapse Analytics? 241  
     Large Volume of Data 241  
     Disparate Sources of Data 241  
     Data Transformation 241  
     Batch or Streaming Data 242  
     Where Should You Not Use Azure Synapse Analytics? 242  
     Use Cases for Azure Synapse Analytics 243  
     Financial Services 243  
     Manufacturing 244  
     Retail 245  
     Healthcare 245  
     Reference Architectures for Azure Synapse Analytics 246  
     Modern Data Warehouse Architecture 246  
     Real-Time Analytics on Big Data Architecture 251  
     Summary 254  
  Index 257  


nach oben


  Mehr zum Inhalt
Kapitelübersicht
Kurzinformation
Inhaltsverzeichnis
Leseprobe
Blick ins Buch
Fragen zu eBooks?

  Navigation
Belletristik / Romane
Computer
Geschichte
Kultur
Medizin / Gesundheit
Philosophie / Religion
Politik
Psychologie / Pädagogik
Ratgeber
Recht
Reise / Hobbys
Technik / Wissen
Wirtschaft

© 2008-2024 ciando GmbH | Impressum | Kontakt | F.A.Q. | Datenschutz