A Quick Bad Take on AWS PrivateLink Service Endpoints

Back when I was a kid…

A Quick Bad Take on AWS PrivateLink Service Endpoints
Image by Quion Al from Unsplash

Back when I was a kid…

For those of us that have been using AWS before there were VPCs, the array of network services that are now available to connect regions and AWS accounts is dizzying.

A decade ago, your best bet (especially if you didn’t want to deal with the hell of Linux IPSEC, oh the days I spent getting FreeS/WAN working in the late 90s) was to spin up an EC2 instance with OpenVPN to connect VPCs — or allow connectivity to/from an on-prem PFSense boxes.

There was that one weird EC2 interface setting that I don’t remember (except that I often forgot it) that allowed you to route traffic through an EC2 instance. And if it wasn’t that, it was forgetting to enable IP Forwarding which caused you to spend hours troubleshooting.

Now there is an AWS Service for that, although why you would use OpenVPN in the WireGuard (and TailScale) age is beyond me. Things have definitely gotten easier as many of us start our 2nd decade in the cloud. Or have they?

Consumers and Service Providers, Endpoints and Endpoint Services, Oh My!

Leave it to AWS to introduce complex terms that make simple concepts, well, cloudy. I would love to see the “one-pager” on PrivateLink Endpoints that set all of this in motion, but let’s dig in.

If you look inside your VPC you’ll see no shortage of “gateways” and this doesn’t count DNS Firewalls, Network Firewalls, VPNs, and Transit Gateways.

Just a few of your VPC networking options these days

I guess this is what happens if you let thousands of two-pizza teams run wild, but for today we’ll just talk about “Endpoints” and “Endpoint Services.”

Think of endpoints as “sockets” you open inside a VPC to connect with a variety of external services so traffic does not have to traverse the public Internet–or even “leave your VPC.” Well, at least in theory.

These services you communicate with could be in another VPC of yours or a VPCs in your customer’s account, or even with your vendors. Endpoints do resolve to “virtual hosts,” so it is possible to have multiple TCP sockets on a single endpoint, if you have multiple TCP listeners on your NLB configured in the endpoint service but I’m getting ahead of myself.)

If you click on “Create Endpoint” you will see the following. For this blog, I’m focusing on “Other Endpoint Services.”

On the other end of the phone line are endpoint services. These are how you expose services (that could be web services or TCP-based protocols) inside your VPC to the outside. This could be your customers. This could be other VPCs (if you don’t want routed paths) or other accounts within your organization.

So to dumb it down. Endpoint Services is a “server” that front-ends your service, and the Endpoint to the “client.”

These days, if you create a VPC with the wizard you should already see an S3 endpoint available. In a bit, we’ll see the endpoint that we will end up creating.$ aws ec2 describe-vpc-endpointsVpcEndpoints:- CreationTimestamp: '2022-08-14T15:24:59+00:00'DnsEntries: []Groups: []NetworkInterfaceIds: []OwnerId: 'XXXXXXXX'PolicyDocument: '{"Version":"2008-10-17","Statement":[{"Effect":"Allow","Principal":"*","Action":"*","Resource":"*"}]}'PrivateDnsEnabled: falseRequesterManaged: falseRouteTableIds:- rtb-0e037e2cf5058ed47- rtb-0d0a3ea7b021f42b0ServiceName: com.amazonaws.us-east-2.s3State: availableSubnetIds: []Tags:- Key: NameValue: oh-5053-vpce-s3VpcEndpointId: vpce-05865e6d1ce836835VpcEndpointType: GatewayVpcId: vpc-098c8673801ff2c1d

(I’m sure you’ll be just as annoyed by my constant switching between YAML and JSON as I was, but dealt with it!)

Sometimes it is Easier to Start with Code

In this example, I’m doing something very silly. The goal is to securely expose an SSH server from VPC to another without putting it on the Internet. Just stop with the SSM before you get started, because I want to avoid IAM if at all possible and use the tools we had in the late 1990s when I started my career. The old tools are the best tools.

SSH Connectivity from one Account/VPC to Another (Same Region)

Don’t bother looking at GIthub for Private Link Terraform modules or even the documentation for now. It will over-complicate things.

From left to right, we’ll start by defining the endpoint in the “Client” or “Consumer” VPC.resource "aws_vpc_endpoint" "service_consumer" {vpc_id              = var.vpcidsubnet_ids          = [ var.subnet_a ]security_group_ids  = [aws_security_group.endpoint_sg.id]service_name        = var.service_provider_name    vpc_endpoint_type   = "Interface"}

Because I hate myself, I used “service_consumer” and “service_provider_name” but those names don’t really matter. Pick a name that is meaningful. The most important argument is vpc_endpoint_type which you set to “Interface” if you are connecting to an arbitrary non-AWS service in the remote VPC. Now to the “Server side” where you defined the endpoint service.resource "aws_vpc_endpoint_service" "ssh_endpoint_svc" {    network_load_balancer_arns = [aws_lb.ssh_nlb.arn]    allowed_principals = [var.allowed_principals]acceptance_required = false}

This is really it.

You are constrained to using an NLB though, but you do not have to define multiple subnets and availability zones as you’ll see above. I’ve only tested allowed_principals for an account but apparently permissions can be more granular down to an IAM role or user.

Here is where I would have talked about NLBs

When I was getting this working for the first time through the AWS Console, getting the NLB healthy was the most tedious and time consuming bit, but it was a good reminder that having a foundation in the services that were released in the early 2010s is necessary despite all the magic of Serverless and Containers. Understanding the abstractions in EC2 and ELB are still very important. Damn important. There are tons of examples of NLB, ALB, ELB setting up the target group and listener so I won’t get into that here except to show what it looks like. And to rant why NLB’s take so long to create.

Building the Infrastructure

But in terms of running the Terraform you would have to start with the Endpoint Service first because you need to pass the created service name to you your Endpoint code.

After you run that you should see the following resources created by service_provider code. The is includes an EC2 instances with no public IP and an NLB with the minimal security groups. It will copy your local SSH key to the remote instance as well.$ terraform state listaws_instance.plink_targetaws_key_pair.mykeyaws_lb.ssh_nlbaws_lb_listener.mylisteneraws_lb_target_group.ssh_nlb_targetaws_lb_target_group_attachment.testaws_security_group.instance_sgaws_vpc_endpoint_service.ssh_endpoint_svc

And then you can see any of your own Endpoint Services with the following:.$ aws ec2 describe-vpc-endpoint-services --output json | jq '.ServiceDetails|.[]|select(.Owner != "amazon")'{"ServiceName": "com.amazonaws.vpce.us-east-2.vpce-svc-09f2d84e33bdc5fa7","ServiceId": "vpce-svc-09f2d84e33bdc5fa7","ServiceType": [{"ServiceType": "Interface"}],"AvailabilityZones": ["us-east-2a"],"Owner": "XXXXXX","BaseEndpointDnsNames": ["vpce-svc-09f2d84e33bdc5fa7.us-east-2.vpce.amazonaws.com"],"VpcEndpointPolicySupported": false,"AcceptanceRequired": false,"ManagesVpcEndpoints": false,"Tags": [],"SupportedIpAddressTypes": ["ipv4"]}

The jq select statement removes the built-in Amazon endpoints which will clutter up your CLI output. This maps to the Terraform output which you’ll need for the consumer (client-side) code to work.Outputs:service_provider_name = "com.amazonaws.vpce.us-east-2.vpce-svc-09f2d84e33bdc5fa7"

Trust but Verify

To keep our jq skills fresh, let’s query the two different AWS accounts and get the IPs of our SSH servers that you’ll see below. Stack Overflow is littered with bad jq advice, but this was pretty straightforward.$ aws --profile original  --region us-east-2 ec2 describe-instances | jq -r '.Reservations[].Instances[] | [.PrivateIpAddress,.PublicIpAddress, .InstanceId ]| @csv'"","","i-0a393a6077d1a9488"$ aws --region us-east-2 ec2 describe-instances --output json | jq -r '.Reservations[].Instances[] | [.PrivateIpAddress,.PublicIpAddress, .InstanceId ]| @csv'"",,"i-01dc568e7c5a4c8ed"

Next, SSH to the “client” SSH server that has a public IP address. Our remote SSH server on the other side of the PrivateLink connection and NLB does not and it is configured in the private subnet of the VPC.

From this server I can first verify with Netcat. We’ll see the TCP connection flows in the CloudWatch metrics below.nc vpce-01568976ead95ca31-9gcn6mk0.vpce-svc-09f2d84e33bdc5fa7.us-east-2.vpce.amazonaws.com 22                  
SSH-2.0-OpenSSH_8.9p1 Ubuntu-3

And after copying the private key from my laptop you’ll see which was the IP address above.

Because it is an NLB you can get the normal L3/L4 metrics you would expectin CloudWatch.