Microsoft has announced the general availability of two new Azure analytics services – Azure Data Lake Storage Gen2 (ADLS) and Azure Data Explorer (ADX). Furthermore, Microsoft also announced the preview of Azure Data Factory Mapping Data Flow.
The new ADLS Gen2 service combines scalability, cost-effectiveness, and a security model with rich analytics capabilities using the Hadoop Distributed File System (HDFS). Moreover, with the HDFS customers can store both structured and unstructured data, along with an Azure Blob File System driver (ABFS) that allows files and folders to be distinctly addressed on the server side – eliminating the need for a complex client-side driver and ensuring high fidelity file system transactions.
Jurgen Willis, director of product management, Azure Engineering, shared in his blog post on how Microsoft further boosts analytics performance for ADLS:
We implemented a hierarchical namespace (HNS) which supports atomic file and folder operations. This is important because it reduces the overhead associated with processing big data on blob storage. This speeds up job execution and lowers cost because fewer compute operations are required. The ABFS driver and HNS significantly improve ADLS’ performance, removing scale and performance bottlenecks.
Next, to the performance boost, Microsoft also offers the same robust data security capabilities built into Azure Blob Storage, such as:
- Encryption of data in transit and at rest via TLS 1.2
- Storage account firewalls
- Virtual network integration
- Role-based access security
Currently, ADLS is available in almost all Azure regions except for US DOD Central and US DOD East. Furthermore, the pricing details for ADLS are available on the pricing page.
With the new ADX, customers can leverage a fully managed data analytics service for real-time analysis on large volumes of streaming data. This service is, according to the blog post by Willis, capable of querying 1 billion records in under a second with no modification of the data or metadata required. Furthermore, ADX includes native connectors to Azure Data Lake Storage, Azure SQL Data Warehouse, and Power BI and comes with an intuitive query language allowing customers to obtain insights in minutes.
Microsoft made the design for ADX with speed and simplicity in mind – it combines two distinct services that work in tandem:
- The Engine, a service responsible for processing the incoming raw data and serving user queries, and
- A Data Management (DM) service, which allows the ingestion of various types of raw data. Furthermore, the DM is also responsible for managing failures, backpressure, and data grooming tasks when necessary.
Note that both services are deployed as clusters of compute nodes (virtual machines) in Azure.
ADX is currently available in 41 Azure regions, and pricing details are available on the pricing page.
With the two new services, customers can have greater flexibility in managing unstructured data or data generated from interactions on the web, software-as-a-service apps, social media, mobile apps, and internet of things devices. According to John Chirapurath, general manager of Azure data, blockchain, and AI at Microsoft in a VentureBeat article:
We always strive to make it very easy for IT staff to adopt analytics and for line-of-business people to utilize and deliver powerful insights using beautiful products.
Lastly, Microsoft also released a preview of a new Mapping Data Flow capability in Azure Data Factory (ADF) – a hybrid cloud-based data integration service for orchestrating and automating data movement and transformation. With the new capability, customers can visually design, build, and manage data transformation processes without learning Spark or having a deep understanding of their distributed infrastructure. Currently, ADF is available in 21 regions and pricing details are available on the pricing page.
Leave a Reply