Amazon Web Services (AWS) – Scaling on Public Cloud
The networking industry has seen quite a few changes in the past decades. Many users have turned their backs to hardware appliances provided by the usual big networking vendors, chasing substantial savings in CAPEX and OPEX and running away from a vendor lock-in position.
These users have found huge flexibility and higher return on investment with software-based appliances running on COTS servers. Cloud computing services have been the accelerating factors of this transformation, helping users with on-demand access to Infrastructure from a pool of resources.
One of the Cloud Computing providers in the market has been a pioneer providing Public Cloud Services since August 2006: Amazon Web Services – AWS EC2.
This blog focuses on the challenges of scaling a software-based network appliance on AWS EC2.
Unlimited Resources / Limited Bandwidth
AWS EC2 is a cloud computing environment providing flexible and on-demand computing resources to any users who are looking to run a software on the public cloud infrastructure provided by AWS.
To run a network appliance on AWS, a user needs to:
- Select a region in the world to host the environment
- Choose a computing environment, called an instance type
- Use a software image prepared to run on AWS, called an AMI – Amazon Machine Image
- Configure networking elements such as Network Adapters, Network Subnets/Routes, Security Groups, etc.
AWS provides plenty of information to help with the instance type selection, providing details and comparisons between different options. These instance types are categorized by code names which consist of alphabetical characters and numbers. Each instance type relies on a different type of vCPU, RAM, storage and network performance.
Instance Name | Use Case | vCPU | RAM (GB) | Network B/W(Gbps) |
c5.xlarge | Compute Optimized | 4 | 8 | up to 10 |
c5n.metal | Compute Optimized | 72 | 192 | 100 |
c6g.16xlarge | Compute Optimized | 64 | 128 | 25 |
r6a.32xlarge | Memory Optimized | 128 | 1024 | 50 |
m6id.12xlarge | General Purpose | 48 | 192 | 18.75 |
Table 1: AWS Instance Type Examples
A quick look at the different types of instances on AWS can let anyone think that resources are unlimited. Indeed, AWS is providing a great variety of CPU types including:
- Intel Xeon processors
- AMD APYC processors
- ARM based AWS Graviton processors
The number of vCPUs and the amount of RAM proposed are sufficient to provide enough horse power to any network appliance. The advertised network bandwidth ranges from low speed up to 100Gbps.
However, if you read between the lines, AWS highly recommends to measure the performance of any network application running on AWS infrastructure regardless of the selected instance type. This recommendation could be confusing for users who are presented with a table providing an exact Network Bandwidth number for each instance type. While AWS users are expecting to reach network performance similar to the one advertised on the instance type details, they are often ending up chasing packets drop that are not happening at the application level. In reality, like many other Public Cloud vendors, AWS has implemented a network allowance per instance which is limiting the number of packets per second that can transit thru a network adapter. Due to the limitations implemented by AWS, only a certain number of packets can be processed per second by the underlying infrastructure which will obviously limit the application running on top.
While the flow limitation would not be seen when running TCP based network test tools such as iperf, a per packet-based network application; a virtual router for instance, will directly be impacted
6WIND Virtual Service Router (VSR) on AWS
6WIND VSR is a product suite of software-based routers running on COTS servers, serving use cases such as:
- Security Gateway Router
- Provider Edge Router
- Border Router Gateway
- Cell Site Router
- Carrier Grade NAT Router
- Customer Premise Equipment (CPE) Router
It supports different deployment models including bare metal, virtualized, containerized and Cloud-Native. Based on the Data Plane Development Kit (DPDK) framework, it supports a broad range of CPUs and NICs.
Recently conducted performance measurements are showing an ability to reach 46Gbps of IPv4 forwarding per CPU core with a traffic pattern including a mix of packets with different sizes (IMIX Traffic), with an average packet size of 350 bytes. This performance scales linearly with the number of cores allocated to the software. So far, we have measured up to 800Gbps of IPv4 forwarding throughput using an IMIX traffic on a single COTS server.
Figure 1: 6WIND VSR performance on Intel® Xeon® processor
The 6WIND VSR software is also supported on AWS with any type of AWS instances including Intel, AMD EPYC and Graviton. Typical 6WIND VSR users who are familiar with the linear scaling model on bare metal, virtual or containerized environments should take into account the AWS bandwidth allowance when dimensioning their network. While the 6WIND VSR appliance would have no issue in processing many millions of packets per second, the AWS infrastructure would put a limit to these packets to comply with AWS fair usage policies.
How to scale 6WIND VSR on AWS
6WIND VSR software is capable of reaching higher throughputs thanks to its unique architecture offering linear scaling with the number of vCPUs allocated to the data plane. However, this linear scaling does not apply on AWS due to the different network bandwidth limitations. The question is to know how to scale on AWS.
There is no single answer to the scaling challenge on AWS and many users have adapted their use case to fit the Public Cloud infrastructure model. The simple answer would be to scale horizontally. This would consist of using smaller instance types and using them all together with the help of a load balancer that would split the traffic between them.
If we take the example of a Provider Edge router that needs to achieve 100Gbps of MPLS on AWS, we can easily split router into 10 smaller Provider Edge routers, each capable of achieving 10Gbps. A load balancer such as the one provided by AWS, Elastic Load Balancer (ELB), could help aggregate the traffic and provide some sort of redundancy between the Provider Edge routers.
At 6WIND, we understand that each use case comes with its own requirements and we are happy to discuss in detail your network projects involving the Public Cloud and help you achieve your goals taking into account all the scaling challenges added by a public cloud infrastructure.
If you want to learn more, do feel free to contact us for more information. You can also schedule a free trial of our VSR solutions.