Data Factory

Documentation

Service Description

Data processing in today's companies is marked by heterogeneous data storage (SQL, NoSQL, unstructured data, etc.) and processing components (databases, Big Data processors, etc.). Data in a company often passes through complex paths from generation or receipt of the data, through various data processing components, to storage or distribution of the data to various recipients. With Data Factory, local data such as that from SQL Server can be processed together with cloud-related data from Azure SQL Database, Blobs, and Tables. These data processing streams can be created, processed, and monitored through simple, highly available data pipelines. Data sources and data recipients can be defined, and the movement of the data in the company can be traced and monitored from a central location.

Getting Started

  1. 10/7/2015, Video, 0:59:31
    The data landscape is more varied than ever with unstructured and structured data originating from many cloud and on-premises sources. Data Factory enables you to process...
  2. 2/25/2016, Mva
    Exploring data orchestration concepts? Check out this course on the basic capabilities of Azure Data Factory (ADF). Get an overview of advanced analytics, and see how Azure...
  3. 1/16/2017, Mva
    If you’d like to learn how to architect solutions in Cortana Intelligence Suite and how to build intelligence into your applications, don’t miss this workshop! Build an...
  4. 10/7/2015, Video, 0:31:34
    This video takes a deeper dive into the features and functions of the Azure Data Factory orchestration engineLearn more: http://aka.ms/g7hlpt
  5. 10/7/2015, Video, 1:11:26
    This video takes a deeper dive into how Azure Data Factory can be used to build hybrid big data analytics pipelines through the lens of an automated risk processing use case...



Latest Content

Subscribe to News about Data Factory

Title  
Blog
Blog
Blog
Blog
Video
Blog
Blog
Blog
Blog
Blog
Blog
Blog
more...


Web Content

Data Factory Documentation

1. Switch to version 1 documentation
2. Overview
     2.1. Introduction to Data Factory
     2.2. Compare current version to version 1
3. Quickstarts
     3.1. Create data factory - User interface (UI)
     3.2. Create data factory - Copy Data tool
     3.3. Create data factory - Azure PowerShell
     3.4. Create data factory - .NET
     3.5. Create data factory - Python
     3.6. Create data factory - REST
     3.7. Create data factory - Resource Manager template
     3.8. Create data flow
4. Tutorials
     4.1. Copy data in cloud
          4.1.1. Copy Data tool
          4.1.2. User interface (UI)
          4.1.3. .NET
     4.2. Copy on-premises data to cloud
          4.2.1. Copy Data tool
          4.2.2. User interface (UI)
          4.2.3. Azure PowerShell
     4.3. Copy data in bulk
          4.3.1. User interface (UI)
          4.3.2. Azure PowerShell
     4.4. Copy data incrementally
          4.4.1. 1 - Copy from one table
               4.4.1.1. User interface (UI)
               4.4.1.2. Azure PowerShell
          4.4.2. 2 - Copy from multiple tables
               4.4.2.1. User interface (UI)
               4.4.2.2. Azure PowerShell
          4.4.3. 3 - Use change tracking feature
               4.4.3.1. User interface (UI)
               4.4.3.2. Azure PowerShell
          4.4.4. 4 - Copy new files by lastmodifieddate
               4.4.4.1. Copy Data tool
          4.4.5. 5 - Copy new files by time partitioned file name
               4.4.5.1. Copy Data tool
     4.5. Transform data in cloud
          4.5.1. HDInsight Spark
               4.5.1.1. User interface (UI)
               4.5.1.2. Azure PowerShell
          4.5.2. Databricks Notebook
               4.5.2.1. User interface (UI)
     4.6. Transform data in virtual network
          4.6.1. User interface (UI)
          4.6.2. Azure PowerShell
     4.7. Add branching and chaining
          4.7.1. User interface (UI)
          4.7.2. .NET
     4.8. Run SSIS packages in Azure
          4.8.1. User interface (UI)
          4.8.2. Azure PowerShell
5. Samples
     5.1. Code samples
     5.2. Azure PowerShell
6. Concepts
     6.1. Pipelines and activities
     6.2. Datasets and linked services
     6.3. Pipeline execution and triggers
     6.4. Integration runtime
     6.5. Mapping Data Flows
          6.5.1. Mapping Data Flow concepts
          6.5.2. Data flow datasets
          6.5.3. Debug Mode
          6.5.4. Schema Drift
          6.5.5. Inspect Pane
          6.5.6. Column Patterns
          6.5.7. Data Flow Monitoring
          6.5.8. Move Nodes
          6.5.9. Optimize Tab
          6.5.10. Expression Builder
          6.5.11. Reference Nodes
          6.5.12. Expression language
     6.6. Roles and permissions
     6.7. Understanding pricing
     6.8. Naming rules
7. How-to guides
     7.1. Author
          7.1.1. Visually author data factories
          7.1.2. Continuous integration and delivery
          7.1.3. Iterative development and debugging
     7.2. Connectors
          7.2.1. Amazon Marketplace Web Service
          7.2.2. Amazon Redshift
          7.2.3. Amazon S3
          7.2.4. Azure Blob Storage
          7.2.5. Azure Cosmos DB SQL API
          7.2.6. Azure Cosmos DB MongDB API
          7.2.7. Azure Data Explorer
          7.2.8. Azure Data Lake Storage Gen1
          7.2.9. Azure Data Lake Storage Gen2
          7.2.10. Azure Database for MariaDB
          7.2.11. Azure Database for MySQL
          7.2.12. Azure Database for PostgreSQL
          7.2.13. Azure File Storage
          7.2.14. Azure Search
          7.2.15. Azure SQL Database
          7.2.16. Azure SQL Database Managed Instance
          7.2.17. Azure SQL Data Warehouse
          7.2.18. Azure Table Storage
          7.2.19. Cassandra
          7.2.20. Common Data Service for Apps
          7.2.21. Concur
          7.2.22. Couchbase
          7.2.23. DB2
          7.2.24. Drill
          7.2.25. Dynamics 365
          7.2.26. Dynamics AX
          7.2.27. Dynamics CRM
          7.2.28. File System
          7.2.29. FTP
          7.2.30. Google AdWords
          7.2.31. Google BigQuery
          7.2.32. Google Cloud Storage
          7.2.33. Greenplum
          7.2.34. HBase
          7.2.35. HDFS
          7.2.36. Hive
          7.2.37. HTTP
          7.2.38. HubSpot
          7.2.39. Impala
          7.2.40. Informix
          7.2.41. Jira
          7.2.42. Magento
          7.2.43. MariaDB
          7.2.44. Marketo
          7.2.45. Microsoft Access
          7.2.46. MongoDB
               7.2.46.1. MongoDB (legacy)
          7.2.47. MySQL
          7.2.48. Netezza
          7.2.49. OData
          7.2.50. ODBC
          7.2.51. Office 365
          7.2.52. Oracle
          7.2.53. Oracle Eloqua
          7.2.54. Oracle Responsys
          7.2.55. Oracle Service Cloud
          7.2.56. Paypal
          7.2.57. Phoenix
          7.2.58. PostgreSQL
          7.2.59. Presto
          7.2.60. QuickBooks Online
          7.2.61. REST
          7.2.62. Salesforce
          7.2.63. Salesforce Service Cloud
          7.2.64. Salesforce Marketing Cloud
          7.2.65. SAP Business Warehouse Open Hub
               7.2.65.1. Load SAP BW data
          7.2.66. SAP Business Warehouse MDX
          7.2.67. SAP Cloud for Customer
          7.2.68. SAP ECC
          7.2.69. SAP HANA
          7.2.70. ServiceNow
          7.2.71. SFTP
          7.2.72. Shopify
          7.2.73. Spark
          7.2.74. SQL Server
          7.2.75. Square
          7.2.76. Sybase
          7.2.77. Teradata
          7.2.78. Vertica
          7.2.79. Web Table
          7.2.80. Xero
          7.2.81. Zoho
     7.3. Move data
          7.3.1. Copy data using Copy Activity
          7.3.2. Delete files using Delete Activity
          7.3.3. Copy Data tool
          7.3.4. Load Data Lake Storage Gen2
               7.3.4.1. Copy from Data Lake Storage Gen1
          7.3.5. Load SQL Data Warehouse
          7.3.6. Load Data Lake Storage Gen1
          7.3.7. Load SAP BW data
          7.3.8. Load Office 365 data
          7.3.9. Read or write partitioned data
          7.3.10. Format and compression support
          7.3.11. Schema and type mapping
          7.3.12. Fault tolerance
          7.3.13. Performance and tuning
     7.4. Transform data
          7.4.1. HdInsight Hive Activity
          7.4.2. HdInsight Pig Activity
          7.4.3. HdInsight MapReduce Activity
          7.4.4. HdInsight Streaming Activity
          7.4.5. HdInsight Spark Activity
          7.4.6. ML Batch Execution Activity
          7.4.7. ML Update Resource Activity
          7.4.8. Stored Procedure Activity
          7.4.9. Data Lake U-SQL Activity
          7.4.10. Databricks Notebook Activity
          7.4.11. Databricks Jar Activity
          7.4.12. Databricks Python Activity
          7.4.13. Custom activity
          7.4.14. Compute linked services
     7.5. Control flow
          7.5.1. Append Variable Activity
          7.5.2. Azure Function Activity
          7.5.3. Execute Data Flow Activity
          7.5.4. Execute Pipeline Activity
          7.5.5. Filter Activity
          7.5.6. For Each Activity
          7.5.7. Get Metadata Activity
          7.5.8. If Condition Activity
          7.5.9. Lookup Activity
          7.5.10. Set Variable Activity
          7.5.11. Until Activity
          7.5.12. Wait Activity
          7.5.13. Web Activity
     7.6. Data flow transformations
          7.6.1. Aggregate
          7.6.2. Alter row
          7.6.3. Conditional split
          7.6.4. Derived column
          7.6.5. Exists
          7.6.6. Filter
          7.6.7. Join
          7.6.8. Lookup
          7.6.9. New branch
          7.6.10. Pivot
          7.6.11. Select
          7.6.12. Sink
          7.6.13. Sort
          7.6.14. Source
          7.6.15. Surrogate key
          7.6.16. Union
          7.6.17. Unpivot
          7.6.18. Window
     7.7. Parameterize
          7.7.1. Parameterize linked services
          7.7.2. Expression Language
          7.7.3. System variables
     7.8. Security
          7.8.1. Data movement security considerations
          7.8.2. Store credentials in Azure Key Vault
          7.8.3. Encrypt credentials for self-hosted integration runtime
          7.8.4. Managed identity for Data Factory
     7.9. Monitor and manage
          7.9.1. Monitor visually
          7.9.2. Monitor with Azure Monitor
          7.9.3. Monitor with SDKs
          7.9.4. Monitor integration runtime
          7.9.5. Monitor Azure-SSIS integration runtime
          7.9.6. Reconfigure Azure-SSIS integration runtime
          7.9.7. Copy or clone a data factory
     7.10. Create integration runtime
          7.10.1. Azure integration runtime
          7.10.2. Self hosted integration runtime
          7.10.3. Azure-SSIS integration runtime
          7.10.4. Shared self-hosted integration runtime
     7.11. Run SSIS packages in Azure
          7.11.1. Run SSIS packages with Execute SSIS Package activity
          7.11.2. Run SSIS packages with Stored Procedure activity
          7.11.3. Schedule Azure-SSIS integration runtime
          7.11.4. Join Azure-SSIS IR to a virtual network
          7.11.5. Enable Azure AD authentication for Azure-SSIS IR
          7.11.6. Provision Enterprise Edition for Azure-SSIS IR
          7.11.7. Customize setup for Azure-SSIS IR
          7.11.8. Install licensed components for Azure-SSIS IR
          7.11.9. Configure high performance for Azure-SSIS IR
          7.11.10. Configure disaster recovery for Azure-SSIS IR
          7.11.11. Clean up SSISDB logs with Elastic Database Jobs
     7.12. Create triggers
          7.12.1. Create an event-based trigger
          7.12.2. Create a schedule trigger
          7.12.3. Create a tumbling window trigger
     7.13. Templates
          7.13.1. Overview of templates
          7.13.2. Copy files from multiple containers
          7.13.3. Copy new files by LastModifiedDate
          7.13.4. Bulk copy with control table
          7.13.5. Delta copy with control table
          7.13.6. Transform data with Databricks
8. Reference
     8.1. .NET
     8.2. PowerShell
     8.3. REST API
     8.4. Resource Manager template
     8.5. Python
9. Resources
     9.1. Ask a question - MSDN forum
     9.2. Ask a question - Stack Overflow
     9.3. Request a feature
     9.4. FAQ
     9.5. Roadmap
     9.6. Pricing
     9.7. Availability by region
     9.8. Support options

Online Training Content

Date Title
5/24/2017 Orchestrating Big Data with Azure Data Factory
1/16/2017 Cortana Intelligence Suite End-to-End
7/4/2016 Design and Implement Big Data & Advanced Analytics Solutions
2/25/2016 Orchestrating Data and Services with Azure Data Factory

Tools

Tool Description

Videos

Date Title Length
10/5/2018
Get cloud-scale analytics of Office 365 data with Azure Data Factory | Azure Friday
0:09:21
10/2/2018
Azure data integration: Choosing between SSIS Azure Data Factory and Azure Databricks - THR1171
0:20:44
10/2/2018
Real-world data movement and orchestration patterns using Azure Data Factory V2 - BRK2279
0:44:59
10/2/2018
Thousands of Azure data warehousing success stories - BRK2408
0:39:39
10/2/2018
Automating Azure SQL Data Warehouse - THR2192
0:17:19
10/2/2018
Making AI real with SQL Server Azure databases and Azure big data analytics services - GS005
1:19:44
10/2/2018
Azure Databricks for data engineers and data developers - BRK3313
1:17:59
10/2/2018
Azure Data Factory - Enabling modern data integration in the cloud - BRK2204
1:08:49
9/7/2018
Enhanced productivity using Azure Data Factory visual tools | Azure Friday
0:14:41
8/16/2018
Monitor Data Factory pipelines using Azure Monitor and Log Analytics | Azure Friday
0:07:17

Page 3 of 9