Start via Cloud Partners
Using PyTorch with AWS
To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.
AWS provides both:
- Deep Learning AMIs: dedicated, pre-built machine learning instances, complete with PyTorch
- Deep Learning Base AMI: bare Linux and Windows instances for you to do a custom install of PyTorch.
Quick Start on Deep Learning AMI
If you want to get started with a Linux AWS instance that has PyTorch already installed and that you can login into from the command-line, this step-by-step guide will help you do that.
- Sign into your AWS console. If you do not have an AWS account, see the primer below.
- Click on
Launch a virtual machine
. - Select
Deep Learning AMI (Ubuntu)
.This gives you an instance with a pre-defined version of PyTorch already installed. If you wanted a bare AWS instance that required PyTorch to be installed, you could choose the
Deep Learning Base AMI (Ubuntu)
, which will have the hardware, but none of the software already available. - Choose a GPU compute
p3.2xlarge
instance type.You can choose any of the available instances to try PyTorch, even the free-tier, but it is recommended for best performance that you get a GPU compute or Compute optimized instance. Other instance options include the Compute Optimized c5-series (e.g.,
c5.2xlarge
) or the General Compute t2-series or t3-series (e.g.,t2.2xlarge
). It is important to note that if you choose an instance without a GPU, PyTorch will only be running in CPU compute mode, and operations may take much, much longer. - Click on
Review and Launch
. - Review the instance information and click
Launch
. - You will want to
Create a new key pair
if you do not have one already to use. Pick a name and download it locally via theDownload Key Pair
button. - Now click on
Launch Instances
. You now have a live instance to use for PyTorch. If you click onView Instances
, you will see your running instance. - Take note of the
Public DNS
as this will be used tossh
into your instance from the command-line. - Open a command-line prompt
- Ensure that your key-pair has the proper permissions, or you will not be able to log in. Type
chmod 400 path/to/downloaded/key-pair.pem
. - Type
ssh -i path/to/downloaded/key-pair.pem ubuntu@<Public DNS that you noted above>
. e.g.,ssh -i ~/Downloads/aws-quick-start.pem ubuntu@ec2-55-181-112-129.us-west-2.compute.amazonaws.com
. If asked to continue connection, typeyes
. - You should now see a prompt similar to
ubuntu@ip-100-30-20-95
. If so, you are now connected to your instance. - Verify that PyTorch is installed by running the verification steps below.
If you chose the
Deep Learning Base AMI (Ubuntu)
instead of theDeep Learning AMI (Ubuntu)
, then you will need to install PyTorch. Follow the Linux getting started instructions in order to install it.
Quick Start Verification
To ensure that PyTorch was installed correctly, we can verify the installation by running sample PyTorch code. Here we will construct a randomly initialized tensor.
import torch
x = torch.rand(5, 3)
print(x)
The output should be something similar to:
tensor([[0.3380, 0.3845, 0.3217],
[0.8337, 0.9050, 0.2650],
[0.2979, 0.7141, 0.9069],
[0.1449, 0.1132, 0.1375],
[0.4675, 0.3947, 0.1426]])
Additionally, to check if your GPU driver and CUDA is enabled and accessible by PyTorch, run the following commands to return whether or not the CUDA driver is enabled:
import torch
torch.cuda.is_available()
AWS Primer
Generally, you will be using Amazon Elastic Compute Cloud (or EC2) to spin up your instances. Amazon has various instance types, each of which are configured for specific use cases. For PyTorch, it is highly recommended that you use the accelerated computing instances that feature GPUs or custom AI/ML accelerators as they are tailored for the high compute needs of machine learning.
In order to use AWS, you need to set up an AWS account, if you do not have one already. You will create a username (your email address), password and an AWS account name (since you can create multiple AWS accounts for different purposes). You will also provide contact and billing information. The billing information is important because while AWS does provide what they call “free-tier” instances, to use PyTorch you will want more powerful, paid instances.
Once you are logged in, you will be brought to your AWS console. You can even learn more about AWS through a set of simple tutorials.
AWS Inferentia-based instances
AWS Inferentia is a chip custom built by AWS to provide higher performance and low cost machine learning inference in the cloud. Amazon EC2 Inf1 instances feature up to 16 AWS Inferentia chips, the latest second generation Intel Xeon Scalable processors, and up to 100 Gbps networking to enable high throughput and lowest cost inference in the cloud. You can use Inf1 instances with Amazon SageMaker for a fully managed workflow, or use the AWS Neuron SDK directly which is integrated with PyTorch.
GPU-based instances
Amazon EC2 P4d instances deliver the highest performance for machine learning training on AWS. They are powered by the latest NVIDIA A100 Tensor Core GPUs and feature first in the cloud 400 Gbps instance networking. P4d instances are deployed in hyperscale clusters called EC2 UltraClusters that are comprised of more than 4,000 NVIDIA A100 GPUs, Petabit-scale non-blocking networking, and scalable low latency storage with FSx for Lustre. Each EC2 UltraCluster provides supercomputer-class performance to enable you to solve the most complex multi-node ML training tasks.
For ML inference, AWS Inferentia-based Inf1 instances provide the lowest cost inference in the cloud. Additionally, Amazon EC2 G4dn instances featuring NVIDIA T4 GPUs are optimized for GPU-based machine learning inference and small scale training that leverage NVIDIA libraries.
Creating and Launching an Instance
Once you decided upon your instance type, you will need to create, optionally configure and launch your instance. You can connect to your instance from the web browser or a command-line interface. Here are guides for instance launch for various platforms:
Amazon SageMaker
With SageMaker service AWS provides a fully-managed service that allows developers and data scientists to build, train, and deploy machine learning models.
See AWS documentation to learn how to configure Amazon SageMaker with PyTorch.
Pre-Built AMIs
AWS provides instances (called AWS Deep Learning AMIs) pre-built with a modern version of PyTorch. The available AMIs are:
- Ubuntu
- Amazon Linux
- Windows 2016
Amazon has written a good blog post on getting started with pre-built AMI.
Installing PyTorch From Scratch
You may prefer to start with a bare instance to install PyTorch. Once you have connected to your instance, setting up PyTorch is the same as setting up locally for your operating system of choice.
Using PyTorch with Google Cloud
To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.
Google Cloud provides both:
- dedicated, pre-built machine learning platforms, complete with PyTorch
- bare Linux and Windows virtual machines for you to do a custom install of PyTorch.
Google Cloud Primer
In order to use Google Cloud, you need to set up an Google account, if you do not have one already. You will create a username (typically an @gmail.com
email address) and password. After words, you will be able to try Google Cloud. You will also provide contact and billing information. The billing information is initially used to prove you are a real person. And then, after your trial, you can choose to upgrade to a paid account.
Once you are logged in, you will be brought to your Google Cloud console. You can even learn more about Google Cloud through a set of simple tutorials.
Cloud Deep Learning VM Image
Google Cloud provides no setup required, pre-configured virtual machines to help you build your deep learning projects. Cloud Deep Learning VM Image is a set of Debian-based virtual machines that allow you to build and run machine PyTorch learning based applications.
GPU-based Virtual Machines
For custom virtual machines, generally you will want to use Compute Engine Virtual Machine instances), with GPU enabled, to build with PyTorch. Google has various virtual machine types and pricing options, with both Linux and Windows, all of which can be configured for specific use cases. For PyTorch, it is highly recommended that you use a GPU-enabled virtual machines. They are tailored for the high compute needs of machine learning.
The expense of your virtual machine is directly correlated to the number of GPUs that it contains. One NVIDIA Tesla P100 virtual machine, for example, can actually be suitable for many use cases.
Deep Learning Containers
Google Cloud also offers pre-configured and optimized Deep Learning Containers. They provide a consistent environment across Google Cloud services, making it easy to scale in the cloud or shift from on-premises. You have the flexibility to deploy on Google Kubernetes Engine (GKE), AI Platform, Cloud Run, Compute Engine, Kubernetes, and Docker Swarm.
Installing PyTorch From Scratch
You may prefer to start with a bare instance to install PyTorch. Once you have connected to your instance, setting up PyTorch is the same as setting up locally for your operating system of choice.
Using PyTorch with Azure
To gain the full experience of what PyTorch has to offer, a machine with at least one dedicated NVIDIA GPU is necessary. While it is not always practical to have your own machine with these specifications, there are our cloud based solutions to allow you to test and use PyTorch’s full features.
Azure provides:
- a machine learning service with a robust Python SDK to help you train and deploy PyTorch models at cloud scale.
- dedicated, pre-built machine learning virtual machines, complete with PyTorch.
- bare Linux and Windows virtual machines for you to do a custom install of PyTorch.
PyTorch Enterprise on Azure
Microsoft is one of the founding members and also the inaugural participant of the PyTorch Enterprise Support Program. Microsoft offers PyTorch Enterprise on Azure as a part of Microsoft Premier and Unified Support. The PyTorch Enterprise support service includes long-term support to selected versions of PyTorch for up to 2 years, prioritized troubleshooting, and the latest integration with Azure Machine Learning and other PyTorch add-ons including ONNX Runtime for faster inference.
To learn more and get started with PyTorch Enterprise on Microsoft Azure, visit here.
For documentation, visit here.
Azure Primer
In order to use Azure, you need to set up an Azure account, if you do not have one already. You will use a Microsoft-recognized email address and password. You will also verify your identity by providing contact and billing information. The billing information is necessary because while Azure does provide free usage credits and free services, you may need or want higher-end services as well.
Once you are logged in, you will be brought to your Azure portal. You can even learn more about Azure through a set of simple video tutorials.
Azure Machine Learning Service
The Azure Machine Learning service is a cloud-based service you can use to accelerate your end-to-end machine learning workflows, from training to production. Azure Machine Learning allows you to easily move from training PyTorch models on your local machine to scaling out to the cloud. Using Azure ML’s CLI or Python SDK, you can leverage the service’s advanced functionality for distributed training, hyperparameter tuning, run history tracking, and production-scale model deployments.
See the documentation to learn how to use PyTorch with Azure Machine Learning.
Pre-Configured Data Science Virtual Machines
Azure provides pre-configured data learning and machine learning virtual machines. PyTorch are available on many of these - for example here is the documentation for how to setup an Azure virtual machine on Ubuntu Linux.
GPU-based Virtual Machines
Microsoft has various virtual machine types and pricing options, with both Linux and Windows, all of which are configured for specific use cases. For PyTorch, it is highly recommended that you use the GPU optimized, virtual machines. They are tailored for the high compute needs of machine learning.
The expense of your virtual machine is directly correlated to the number of GPUs that it contains. The NC6 virtual machine is, for example, one of the smallest, cheapest virtual machines and can actually be suitable for many use cases.
Installing PyTorch From Scratch
You may prefer to start with a bare virtual machine to install PyTorch. Once you have connected to your virtual machine, setting up PyTorch is the same as setting up locally for your operating system of choice.