Introduction

Completed

Many organizations today work with big data. The huge volume and variety of data and the speed of data generation necessitates having systems that help you manage and control it. In the past, organizations used relational database-management systems to control data. However, organizations now want the functionality of open-source software combined with the benefits of hosted platforms. Azure HDInsight is the perfect example of this partnership. HDInsight allows you to process big data in many scenarios using historical or real-time data.

The following graphic depicts an overview of how you might use HDInsight. It depicts several data sources, including Internet of Things (IoT) sensors, databases, and several Azure datastores. HDInsight processes data from these locations. It then makes it available in long-term storage for real-time apps and additional analysis.

Diagram of the architecture of HDInsight in a typical organization, depicting several data sources from which it manages big data.

Example scenario

Let's imagine you work for an organization that builds workloads that ingest data for historical reporting and advanced analytics. Perhaps you also have streaming data that requires analysis. In this situation, you might want to consider using HDInsight. It enables ingestion of all data into a single Data Lake location. You can then use it to manage the following workloads:

  • Batch processing
  • Data warehousing
  • Data-science operations
  • Streaming

What will we be doing?

By the end of this module, you'll be able to evaluate whether HDInsight can help your organization process big data. You'll also be able to describe how HDInsight uses popular open-source frameworks that support many data scenarios.

What is the main goal?

The main goal is to determine whether HDInsight is a suitable choice for your big-data processing requirements.