ciando eBooks - ein Service Ihrer Bibliothek

	Table of Contents	5
	About the Author	14
	About the Technical Reviewer	15
	Acknowledgments	16
	Introduction	17
	Chapter 1: Core Data and Analytics Concepts	19
	Core Data Concepts	19
	What Is Data?	20
	Structured Data	20
	Semi-structured Data	21
	Unstructured Data	21
	Data Processing Methods	22
	Batch Data Processing	22
	Streaming or Real-Time Data Processing	23
	Relational Data and Its Characteristics	24
	Non-Relational Data and Its Characteristics	26
	Core Data Analytics Concepts	28
	What Is Data Analytics?	28
	Data Ingestion	28
	Data Exploration	29
	Data Processing	30
	ETL	30
	ELT	31
	ELT / ETL Tools	32
	Data Visualization	32
	Data Analytics Categories	33
	Descriptive Analytics	34
	Diagnostic Analytics	34
	Predictive Analytics	35
	Prescriptive Analytics	35
	Cognitive Analytics	36
	Summary	36
	Chapter 2: Modern Data Warehouses and Data Lakehouses	38
	What Is a Data Warehouse?	39
	Core Data Warehouse Concepts	40
	Data Model	40
	Model Types	41
	Schema Types	41
	Metadata	42
	Why Do We Need a Data Warehouse?	42
	Efficient Decision-Making	42
	Separation of Concerns	42
	Single Version of the Truth	43
	Data Restructuring	43
	Self-Service BI	43
	Historical Data	44
	Security	44
	Data Quality	44
	Data Mining	45
	More Revenues	45
	What Is a Modern Data Warehouse?	45
	Difference Between Traditional & Modern Data Warehouses	46
	Cloud vs. On-Premises	46
	Separation of Compute and Storage Resources	46
	Cost	47
	Scalability	47
	ETL vs. ELT	48
	Disaster Recovery	48
	Overall Architecture	48
	Data Lakehouse	49
	What Is a Data Lake?	49
	What Is Delta Lake?	50
	What Is Apache Spark?	51
	What Is a Data Lakehouse?	52
	Characteristics of a Data Lakehouse	53
	Various Data Types	53
	AI	53
	Decoupled Compute and Storage Resources	54
	Open Source Storage Format	54
	Data Analytics and BI Tools	54
	ACID Properties	54
	Differences Between a Data Warehouse and a Data Lakehouse	55
	Architecture	55
	Access to Raw Data	55
	Open Source vs. Proprietary	56
	Workloads	56
	Query Engines	56
	Data Processing	57
	Real-Time Data	57
	Examples of Data Lakehouses	58
	Azure Synapse Analytics	58
	Databricks	59
	Benefits of Data Lakehouse	60
	Support for All Types of Data	60
	Time to Market	61
	More Cost Effective	61
	AI	61
	Reduction in ETL/ELT Jobs	62
	Usage of Open Source Tools and Technologies	62
	Efficient and Easy Data Governance	62
	Drawbacks of Data Lakehouse	63
	Monolithic Architecture	63
	Technical Infancy	63
	Migration Cost	64
	Lack of Many Products/Options	64
	Scarcity of Skilled Technical Resources	64
	Summary	65
	Chapter 3: Introduction to Azure Synapse Analytics	66
	What Is Azure Synapse Analytics?	66
	Azure Synapse Analytics vs. Azure SQL Data Warehouse	68
	Why Should You Learn Azure Synapse Analytics?	69
	Main Features of Azure Synapse Analytics	70
	Unified Data Analytics Experience	70
	Powerful Data Insights	71
	Unlimited Scale	72
	Security, Privacy, and Compliance	72
	HTAP	73
	Key Service Capabilities of Azure Synapse Analytics	73
	Data Lake Exploration	74
	Multiple Language Support	75
	Deeply Integrated Apache Spark	76
	Serverless Synapse SQL Pool	77
	Hybrid Data Integration	78
	Power BI Integration	79
	AI Integration	80
	Enterprise Data Warehousing	81
	Seamless Streaming Analytics	82
	Workload Management	82
	Advanced Security	84
	Summary	85
	Chapter 4: Architecture and Its Main Components	86
	High-Level Architecture	87
	Main Components of Architecture	90
	Synapse SQL	90
	Compute Layer	90
	Dedicated Synapse SQL Pool	90
	Serverless Synapse SQL Pool	91
	Storage Layer	93
	Synapse Spark or Apache Spark	94
	Synapse Pipelines	96
	Synapse Studio	98
	Synapse Link	100
	Summary	102
	Chapter 5: Synapse SQL	104
	Synapse SQL Architecture Components	105
	Massively Parallel Processing Engine	106
	Distributed Query Processing Engine	107
	Control Node	107
	Compute Nodes	108
	Data Movement Service	109
	Distribution	109
	Hash Distribution	111
	Round-Robin Distribution	112
	Replication-based Distribution	112
	Azure Storage	114
	Dedicated or Provisioned Synapse SQL Pool	114
	Serverless or On-Demand Synapse SQL Pool	116
	Synapse SQL Feature Comparison	117
	Database Object Types	117
	Query Language	119
	Security	120
	Tools	123
	Storage Options	124
	Data Formats	125
	Resource Consumption Model for Synapse SQL	125
	Synapse SQL Best Practices	126
	Best Practices for Serverless Synapse SQL Pool	127
	Best Practices for Dedicated Synapse SQL Pool	128
	How-To’s	129
	Create a Dedicated Synapse SQL Pool	129
	Create a Serverless or On-Demand Synapse SQL Pool	132
	Load Data Using COPY Statement in Dedicated Synapse SQL Pool	132
	Ingest Data into Azure Data Lake Storage Gen2	133
	Summary	134
	Chapter 6: Synapse Spark	136
	What Is Apache Spark?	137
	What Is Synapse Spark in Azure Synapse Analytics?	139
	Synapse Spark Features & Capabilities	140
	Speed	140
	Faster Start Time	140
	Ease of Creation	140
	Ease of Use	141
	Security	141
	Automatic Scalability	141
	Separation of Concerns	142
	Multiple Language Support	142
	Integration with IDEs	142
	Pre-loaded Libraries	143
	REST APIs	143
	Delta Lake and Its Importance in Synapse Spark	144
	Synapse Spark Job Optimization	145
	Data Format	145
	Memory Management	146
	Data Serialization	146
	Data Caching	147
	Data Abstraction	147
	Join and Shuffle Optimization	148
	Bucketing	149
	Hyperspace Indexing	149
	Synapse Spark Machine Learning	149
	Data Preparation and Exploration	150
	Build Machine Learning Models	150
	Train Machine Learning Models	150
	Model Deployment and Scoring	151
	How-To’s	151
	How to Create a Synapse Spark Pool	151
	How to Create and Submit Apache Spark Job Definition in Synapse Studio Using Python	157
	How to Monitor Synapse Spark Pools Using Synapse Studio	163
	Summary	166
	Chapter 7: Synapse Pipelines	168
	Overview of Azure Data Factory	169
	Overview of Synapse Pipelines	171
	Activities	172
	Pipelines	173
	Linked Services	173
	Dataset	174
	Integration Runtimes (IR)	175
	Azure Integration Runtime (Azure IR)	175
	Self-Hosted Integration Runtimes (SHIR)	176
	Azure SSIS Integration Runtimes (Azure SSIS IR)	177
	Control Flow	177
	Parameters	178
	Data Flow	178
	Data Movement Activities	178
	Category: Azure	179
	Category: Database	180
	Category: NoSQL	181
	Category: File	181
	Category: Generic	182
	Category: Services and Applications	182
	Data Transformation Activities	184
	Control Flow Activities	185
	Copy Pipeline Example	186
	Transformation Pipeline Example	188
	Pipeline Triggers	189
	Summary	190
	Chapter 8: Synapse Workspace and Studio	192
	What Is a Synapse Analytics Workspace?	193
	Synapse Analytics Workspace Components and Features	194
	Azure Data Lake Storage Gen2 Account and File System	194
	Serverless Synapse SQL Pool	195
	Shared Metadata Management	195
	Code Artifacts	196
	What Is Synapse Studio?	197
	Main Features of Synapse Studio	199
	Home Hub	199
	Data Hub	199
	Develop Hub	200
	Integrate Hub	201
	Monitor Hub	202
	Integration	203
	Activities	204
	Manage Hub	204
	Analytics Pools	204
	External Connections	205
	Integration	205
	Security	206
	Synapse Studio Capabilities	206
	Data Preparation	206
	Data Management	207
	Data Exploration	207
	Data Warehousing	207
	Data Visualization	208
	Machine Learning	208
	Power BI in Synapse Studio	209
	How-To’s	210
	How to Create or Provision a New Azure Synapse Analytics Workspace Using Azure Portal	210
	How to Launch Azure Synapse Studio	212
	How to Link Power BI with Azure Synapse Studio	213
	Summary	215
	Chapter 9: Synapse Link	217
	OLTP vs. OLAP	218
	What Is HTAP?	219
	Benefits of HTAP	219
	No-ETL Analytics	219
	Instant Insights	220
	Reduced Data Duplication	220
	Simplified Technical Architecture	220
	What Is Azure Synapse Link?	221
	Azure Cosmos DB	222
	Azure Cosmos DB Analytical Store	222
	Columnar Storage	224
	Decoupling of Operational Store	224
	Automatic Data Synchronization	225
	SQL API and MongoDB API	225
	Analytical TTL	225
	Automatic Schema Updates	226
	Cost-Effective Archiving	226
	Scalability	227
	When to Use Azure Synapse Link for Cosmos DB	227
	Azure Synapse Link Limitations	228
	Azure Synapse Link Use Cases	229
	Industrial IOT	230
	Predictive Maintenance Pipeline	231
	Operational Reporting	231
	Real-Time Applications	232
	Real-Time Personalization for E-Commerce Users	232
	How-To’s	233
	How to Enable Azure Synapse Link for Azure Cosmos DB	233
	How to Create an Azure Cosmos DB Container with Analytical Store Using Azure Portal	235
	How to Connect to Azure Synapse Link for Azure Cosmos DB Using Azure Portal	236
	Summary	237
	Chapter 10: Azure Synapse Analytics Use Cases and Reference Architecture	240
	Where Should You Use Azure Synapse Analytics?	241
	Large Volume of Data	241
	Disparate Sources of Data	241
	Data Transformation	241
	Batch or Streaming Data	242
	Where Should You Not Use Azure Synapse Analytics?	242
	Use Cases for Azure Synapse Analytics	243
	Financial Services	243
	Manufacturing	244
	Retail	245
	Healthcare	245
	Reference Architectures for Azure Synapse Analytics	246
	Modern Data Warehouse Architecture	246
	Real-Time Analytics on Big Data Architecture	251
	Summary	254
	Index	257