In the on-premises world, tape storage is a system in which magnetic tape is used as a recording media to store data. With rapidly growing data volumes, tape storage is the most suitable system for data storage requiring large capacity in the on-premises systems. Tape storage is not used only for backup in case of system failure, but also for archiving data for long-term storage.
Tape backup is a term typically applied to old/outdated technologies, but tapes are still being used to back up the data. Many large enterprises like banks, hospitals, telecommunication organizations rely on tapes to ensure important data is securely stored for long term usage. Storing data on tapes is a time consuming process as it requires more time to read and write from/into tapes, involving human intervention and requires storage facility.
Customer Scenario: The customer’s existing enterprise data warehouse is connected with tape systems –
The recovery process is largely manual and requires coordination among data centres (Tape mounting), DBA (Recovery). The tapes hold data that is in Teradata native format - ARC. No viable alternate tool or solution is available for read/recover/format conversion. The Teradata Platform is capacity constrained - so a physical recovery to the database must be in chunks and immediately export to an alternate storage.
Teradata Archive format: The archive format used by Teradata is a proprietary format called ARC, which stands for Archive.
The recovery will use as is recovery process from Tape and can be performed in chunks of 5 TB.
Once the Teradata Tape data is exported to the NFS Drive, the below options were considered to move the tape data to Azure.
The decision on the approach to upload the Teradata archive files to Azure - whether to go with Azure Data Box or transfer data over internet or Azure Express route is really dependent on the time it takes to transfer the data. For example, a 100TB export, over a 70MBps connection with 100% bandwidth at all times, would take ~140 days to upload the data. The decision point can be dependent on the ETA for the transfer of data - say if over 10 days to upload the data should make use of the Azure Databox service, the number of days can be dependent on how much time the customer can allocate for the data movement.
The focus of this blog is to look at an offline data transfer mode using Azure Databox family due to the large volume of tape archives to be migrated to cloud.
These are the options for offline data transfer.
Azure ExpressRoute allows customers to create private connection between their on-premises infrastructure and Azure infrastructure. Customers can transfer data over a secure and dedicated connection because their data does not travel over the public internet. This is an online data transfer option with high throughput with speeds of up to 10 Gbps when connecting to Azure. With ExpressRoute Direct, the customer can get speed up to 100 Gbps.
Azure Import job can be used to securely import data from tape backup to Azure files or Azure storage. In this import option,
Below is an example of the Import Job Flow.
Azure Data Box is chosen as an option based on the requirements. The customer can copy on-premises data into the storage device from their data centre in an offline mode. The copy activity can be done on 10-Gbps network interfaces from the customer data centre to the Data Box. The device then is shipped back to Azure data centre for the data to be uploaded onto Azure ADLS Gen2 accounts.
An import order typically consists of the following steps:
Step |
Description |
Order |
· Create new import order from the Azure portal. · If the data box is available, it is shipped to customer. · Tutorial link |
Receive and Set Up |
· Once the data box is received at the customer data centre, data box is cabled with Power cable and data cables. · Unlock the data box with password and connect to the data box. · Tutorial link |
Copy Data |
· Copy data to Data box shares from host computer. · Tutorial link – Copy Via SMB · Tutorial link – Copy Via NFS · Tutorial link – Copy Via REST · Tutorial link – Copy Via data copy service · Tutorial link – To managed disks |
Return |
· Prepare, turn off, and ship the data box back to the Azure data centre. · Tutorial link |
Upload |
· From Azure data Beedles, data from data box is copied to Azure storage accounts. · Customers verify data on Azure storage accounts. · The data box disks are securely erased as per the National Institute of Standards and Technology (NIST) guidelines. · Tutorial link |
For the customer requirement, once tape data has been uploaded into Azure Storage account, the access tier of the storage account will be changed to the ‘archive’ tier. The ‘archive’ tier is an offline tier where data is rarely accessed and data cannot be read or modified.
To read data from ‘archive’ tier storage account, the offline data will have to be rehydrated to an online tier either the ‘Hot’ or ‘Cool’ tier. Two rehydration options are:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.