dcsimg
 

Big Data and Azure

Friday Jun 21st 2019 by Hannes DuPreez
Big Data and Azure

Learn to integrate Big Data into your Azure endeavors.

Big Data

Big Data describes the large volume of data, either structured or unstructured, that inundates a business on a daily basis. Big Data treats ways to analyse, extract information from, or deal with data sets that are too large or complex to be dealt with by normal data-processing software.

Big data has the following characteristics:

  • Volume: The quantity of generated and stored data
  • Variety: The type and nature of the data
  • Velocity: The speed at which data is generated and processed
  • Veracity: Data quality and the data value

The Influence of Azure on Big Data

Microsoft Azure transforms data into actionable insights by using machine learning tools. It allows you to combine any data at any scale, and to build and deploy machine learning models at scale.

With the following Azure products, advanced analytics can be performed on Big Data:

  • SQL Data Warehouse
  • Data Factory
  • Azure BLOB Storage
  • Azure Databricks
  • Azure Cosmos DB
  • Power BI

Let's have a look at each of them individually.

SQL Data Warehouse

SQL Data Warehouse is a Cloud-based EDW (Enterprise Data Warehouse) that uses Massively Parallel Processing (a large number of processors that perform a set of computations in parallel) to run complex queries across petabytes of data.

You simply import big data into SQL Data Warehouse with PolyBase T-SQL queries (queries that read data from Hadoop), then, with the power of MPP, run high-performance analytics. The data warehouse then will become the single version of truth which you can count on for insights.

Data Factory

Data Factory is a Cloud data integration service that compose data storage, movement, and processing services into automated data pipelines. Azure Data Factory is a hybrid data integration service that allows you to create, schedule, and orchestrate ETL/ELT (Extract, Transform, Load) workflows.

Azure BLOB Storage

Azure BLOB storage is a Massively scalable object storage for unstructured documents, images, videos, and audio. Azure BLOB storage is optimized for storing massive amounts of unstructured data (data that does not adhere to a particular data model or definition), such as text or binary data, for example.

Azure BLOB storage has the following functions:

  • Serving documents or images directly to a browser
  • Storing files for distributed access
  • Streaming audio and video
  • Writing to log files
  • Storing data for disaster recovery, backup and restore, and archiving

Azure Databricks

Azure Databricks is an easy, fast, and collaborative Apache Spark-based (open-source distributed general-purpose cluster-computing framework, which provides an interface for programming clusters with implicit data parallelism) analytics platform.

Azure Cosmos DB

Azure Cosmos DB is a globally distributed database service. It is designed to provide low latency, elastic scalability of throughput, well-defined semantics for data consistency, and high availability.

Power BI

Power BI is a suite of business analytics tools that deliver insights. Power BI enables you to connect to scores of data sources, simplify data preparations, drive ad hoc analysis, as well as produce reports to be consumed on the Web and across mobile devices.

Conclusion

Big Data has evolved, and keeps on evolving. With the help of Azure tools, Big Data becomes more and more manageable.

Home
Mobile Site | Full Site