HDInsight enterprise customers work with some of the most sensitive data in the world. They want to be able to lock down access to this data at the networking layer as well. However, while service endpoints have been available in Azure data sources, HDInsight customers couldn’t leverage this additional layer of security for their big data pipelines due to the lack of interoperability between HDInsight and other data stores. As we have recently announced, HDInsight is now excited to support service endpoints for Azure Blob Storage, Azure SQL databases and Azure Cosmos DB.
With this enhanced level of security at the networking layer, customers can now lock down their big data storage accounts to their specified Virtual Networks (VNETs) and still use HDInsight clusters seamlessly to access and process that data.
In the rest of this post we will explore how to enable service endpoints and point out important HDInsight configurations for Azure Blob Storage, Azure SQL DB, and Azure CosmosDB.
Azure Blob Storage:
When using Azure Blob Storage with HDInsight, you can configure selected VNETs on a blob storage firewall settings. This will ensure that only traffic from those subnets can access this storage account.
It is important to check the "Allow trusted Microsoft services to access this storage account." This will ensure that HDInsight service will have access to storage accounts and provision the cluster in a seamless manner.
If the storage account is in a different subscription than the HDInsight cluster, please make sure that HDInsight resource provider is registered with the storage subscription. To learn more on how to register or re-register resource providers on a subscription, see additional resource providers and types. If HDInsight resource provider is not registered properly you might get this error message, which can be solved by registration of the resource provider.
NOTE: HDInsight cluster must be deployed into one of the subnets allowed in the blob storage firewall or the VNETs must be peered. This will ensure that the traffic from cluster VMs can reach the storage.
Azure SQL DB
If you are using an external SQL DB for Hive or Oozie metastore, you can configure service endpoints. “Allow access to Azure services” is not a required step from HDInsight point of view, since accessing these databases will happen after the cluster is created and the VMs are injected to the VNET.
NOTE: HDInsight cluster must be deployed into one of the subnets allowed in the SQL DB firewall or the VNETs must be peered. This will ensure that the traffic from cluster VMs can reach the SQL DB.
Azure Cosmos DB
If you are using the spark connector for Azure Cosmos DB you can enable service endpoints in Cosmos DB firewall settings and seamlessly connect to it from HDInsight cluster.
NOTE: HDInsight cluster must be deployed into one of the VNETs allowed in the Cosmos DB firewall or the VNETs must be peered. This will ensure that the traffic from cluster VMs can reach the SQL DB.
Try HDInsight now
We hope you take full advantage of today’s announcements and we are excited to see what you will build with Azure HDInsight. Read the developer guide and follow the quick start guide to learn more about implementing these pipelines and architectures on Azure HDInsight. Stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight. For questions and feedback, please reach out to AskHDInsight@microsoft.com.
About HDInsight
Azure HDInsight is Microsoft’s premium managed offering for running open source workloads on Azure. Today, we are excited to announce several new capabilities across a wide range of OSS frameworks.
Azure HDInsight powers some of the top customer’s mission critical applications ranging in a wide variety of sectors including, manufacturing, retail education, nonprofit, government, healthcare, media, banking, telecommunication, insurance and many more industries ranging in use cases from ETL to Data Warehousing, from Machine Learning to IoT and many more.
Leave a Reply