Skip to content

Storage and data transfer options for VM

From Session #3, Cloud Storage

Context

When using a cloud machine or service, a typical workflow is:

  1. Provision: create and prepare resources and services: VM, storage, software installation, etc
  2. Stage: move code, software, and input data to the VM that will do that computations (execution)
  3. Execute: set the parameters and run the computation which generates output data onto storage
  4. Transfer: move (aka transfer) the output to the next phase in your workflow, often your own 'local' computer

Input and output may be files but it may also be records in a database.

Here are 5 popular options for Storage+VM in context, with short description of how to connect and transfer data.
This a gradient on the amount of work you have to do to alter your software or code from option 1 (no changes) to option 5 (many changes)

Full size PDF

Storage Types

These links are provided and described in other parts of session 3

Azure storage services overview (includes many that are not included in the diagram)

Like VM CPUs, there are way to many choices for a VM disk (primary or auxilliary). Determine the size you need and find the cheapest 'ssd' type you can.
If you are not concerned about peformance at all, find the cheapest option for the size you need.

Data Transfer Methods

  • Desktop Application Azure Storage Explorer User application for moving data down from and up to Azure cloud storage including disks
  • Command Line Application the azcopy utility, a command-line utility for moving data to/from your computer to the cloud, or from cloud-to-cloud. To access Azure Storage accounts you must create and use a special URL that includes a Security Key (a "SAS" key in Azure terms).
  • Python Quickstart: Azure Blob Storage client library for Python
  • R Unfortunately the R libraries that worked with various Azure services have not been worked on for several years and there is no guarantee they will work. AzureStor
  • Database Application Azure Data Studio, a cross-platform application for interacting with databases. Designed for Micrsoft's branded "SQLServer" but works with many open source databases. Requires an existing database in the cloud or elsewhere. There are many open source versions of this kind of database user-interface application: List of database GUIs