If you were curious what AWS Transfer Family is, I’ve already spoiled it in the title. But what is it really, why does it exist, and when would you use it?
A Historical Gap
AWS used to have this small problem – it offered some awesome and powerful storage solutions like EBS (which needs to be mounted to an EC2 instance) and S3 (which you could only interact with via the console, SDK, or CLI). When I started using AWS, one of the first things I can remember searching for was “Can I FTP files to S3?”, and that’s a no.
Naturally the community responded and s3fs-fuse is a utility that allows you to mount S3 as a filesystem to your EC2 instance. With a development history going all the way back to 2008, it’s been impressively and consistently developed, adding in new features and bug fixes.
What it can’t do however is get away from the fact that S3 is not a filesystem, and not all POSIX commands are supported. Those that are are a cunning kind of alias to S3 operations that emulate equivalent behaviour. There’s also the risk of ramping up significant costs – filesystem operations translate to S3 API requests which have a cost of $0.005 per 1000 PUT, POST, and LIST requests, and $0.0004 per 1000 GET, SELECT, and other requests. This might not sound like much, but if you inadvertently trigger or have running some background processes that are read/write intensive, you wouldn’t normally think about it. With s3fs, all of those requests can start to mount up. Maybe not a dealbreaker, but be aware.
But with it mounted to an EC2 instance, you could install your favourite FTP server and upload ‘directly’ into S3.
Now I need to point out I’m being glib with my casual use of the term ‘FTP’. Unsecured FTP on port 21 has been a big no-no for a very long time, so really when I say FTP I mean some secured method which is one of: SFTP (FTP extension of SSH) or FTPS (FTP with SSL certificate).
EFS has entered the chat
Then along comes EFS in June 2016 – AWS’ version of the classic Network File System (NFS) for Linux only (look for Amazon FSx for a Windows solution). It’s more expensive than EBS (which is $0.08 per gb/month with the new gp3 volume) at a whopping $0.30 per gb/month. But where EBS charges you for storage provisioned (regardless of whether you fill it or not), EFS only charges you for what you use and could end up being far cheaper depending on how much headroom you’re putting into your EBS volumes – and of course EFS can be mounted to multiple different instances so overall you’ll get some good mileage out of it for the cost.
Still the same issue though, you can’t FTP into it directly. It still needs to be mounted to an EC2 instance (or latterly, an ECS container or Lambda function), but it at least is a fully POSIX-compliant file system. At the time of writing EFS also doesn’t have a front-end file manager in the AWS Console. I’m sure it’ll be a feature that comes along eventually, but in the meantime I feel a bit blind not knowing what’s on my EFS unless I mount it somewhere to inspect it!
So clearly, we needed more…
AWS Transfer… Orphan?
AWS Transfer Family started out as AWS Transfer for SFTP in November 2018, and rebranded and expanded to include FTPS and plain FTP in April 2020.
The offering is essentially a managed FTP service with S3 as the endpoint for the data. At the time of writing in January 2021, AWS Transfer Family for EFS is red hot off the press as the latest endpoint available, and finally we have a way to examine and transfer files into our EFS volumes without having to mount them somewhere first!
But, as with all managed services this comes at a price. What is it?
Firstly, let’s remind ourselves about S3 and EFS. Uploading into S3 from the internet is free. As mentioned EFS needs to be mounted onto something, but assuming that’s an EC2 instance then uploading data into EC2 from the internet is also free.
At present the price for AWS Transfer Family for SFTP, FTPS, and FTP is the same for all protocols:
|Time protocol is enabled on your endpoint||$0.30 per hour (and charged by hour)|
|Data uploads||$0.04 per gigabyte (GB) transferred|
|Data downloads||$0.04 per gigabyte (GB) transferred|
Ouch – just turning it on and not using it will cost you around $219 a month. That’s roughly the same as an r5.xlarge EC2 instance on demand, a pretty chunky beast with 4 vCPUs and 32 GiB of RAM and 10Gbit of Networking. BIG CAVEAT: Stopping an AWS Transfer Family endpoint does not affect billing. So unlike EC2, you will be charged for the service even if it is stopped – the only way to stop being charged is to delete it completely.
Then for the data transfer – that actually compares quite favourably for downloads set against S3 and EC2, which starts at $0.09 per GB and only goes as low as $0.05 if you’re downloading more than 150TB a month.
Let’s do a full cost comparison with the following assumption: You upload data into your FTP server, and download the same data again – a full up/down transfer cycle.
AWS Transfer Family Data Transfer Cost Comparison
|GiB||S3/EC2 Up/Down||Transfer Family Up/Down|
Then I decided I wanted a more granular comparison, so I spent far too long making this graph:
The point at which EC2/S3 data transfer becomes cheaper than AWS Transfer Family (ATF) is exactly 80TB. The lack of bulk data pricing on ATF starts to hurt it when the volume is high enough!
But I stress this is purely data transfer. If we assume you’ve mounted S3 as a filesystem on your EC2 you’re still paying for those S3 API operations too. There are a few commercial offerings for an “SFTP to S3” product that would seem to do exactly this.
When would you use it?
Quite simply, when you need to use FTP and have no other choice:
Legacy systems – and we know there’s still plenty of those in wide operation. They will have no notion of cloud and can only do exports and data transfer with something as simplistic as FTP. Giving them a familiar protocol to speak to will in some cases be the only way to integrate something ancient with the Cloud.
Client Preference – you’ve got a client you want to exchange data with and setting them up with an IAM user or role and walking them through how to do a cross-account S3 upload just isn’t going to fly a lot of the time. Some clients will want the familiarity and good old FTP, particularly if they’re used to dealing with data transfer in this way. Many of them will have performed expensive and time-consuming risk and compliance analysis of SFTP or FTPS and won’t want to go through the process for another methodology if they don’t absolutely have to. Having said that, popular FTP clients such as Filezilla do support S3 protocol, but it’s still so unfamiliar to many that they won’t even consider it.
Looking at the AWS Customer stories, this appears to be the two main cases where companies are delighted that AWS Transfer Family exists. Then you have my own case – I want to poke around my EFS drives without the hassle of spinning up my own instance to mount it!
Other Features of AWS Transfer Family
- Use your own identity provider. A big one – being able to connect up Activity Directory or similar to grant users access to FTP comes with all the benefits of a single source of truth for identity.
- Use your own domain name – a simple CNAME to the service endpoint will let you brand your FTP endpoint as desired.
- Fixed IP (including BYO IP) – allowing you to have external parties whitelist your FTP endpoint, in line with their own security policies. Naturally, you can whitelist incoming connections yourself via Security Groups.
- FTPS integrates with Amazon Certificate Manager – keep all your SSL management in one place.
- Cross-account support. You can allow access to the service across AWS accounts with cross-account IAM roles.
- CloudTrail and Cloudwatch support – monitor user activity with all of the possibilities of integration with GuardDuty.
- Rock-solid Compliance: AWS Transfer Family is PCI-DSS and GDPR compliant, and HIPAA eligible. The service is also SOC 1, 2, and 3 compliant. Are you going to get that with your self-hosted FTP on EC2? I doubt it!
File Exchange Protocol (FXP)
Does AWS Transfer Family support FXP? Well this is an interesting one, because I couldn’t find it covered in the documentation anywhere. If you don’t know what this is, I’ll borrow from Wikipedia’s definition:
File eXchange Protocol (FXP or FXSP) is a method of data transfer which uses FTP to transfer data from one remote server to another (inter-server) without routing this data through the client’s connection.
Or in other words, FTP to FTP transfers. This is really convenient if you have two remote systems and don’t want to have to pass the data through yourself as the slow proxy in the middle.
So I tested this out for myself – using an FTP set up on an EC2 instance, I set up AWS Transfer Family SFTP endpoint and attempted an FXP transfer between the two. So does AWS Transfer Family support FXP? Yes! I was able to connect and FXP my files across using FXP no problem.
Helping you to not be stupid
As we’ve seen often with S3, the fact you have the ability to control access permissions with a high degree of granularity doesn’t mean you’ll use them properly. Whilst AWS maintains the precepts of the Shared Responsibility Model, they’re also adding in more features to services to stop you being stupid.
Consequently you can deploy AWS Transfer Family as the plain, old, horribly insecure vanilla FTP – but only within a VPC. You can’t make it internet-facing and public by default. You can’t connect it to your Activity Directory identity provider (too insecure), and if you’re absolutely determined to expose it on the public internet you need to put a Network Load Balancer (NLB) in front of it.
Basically you’re not going to trip and do any of that by accident, but if you absolutely want to, you still can and the risks are shouted in your face at every step.
AWS Transfer Family is in line with most other AWS managed services. Nice features, nice integrations with the rest of the platform and other software. But like most managed offerings it’s going to cost you a pretty reasonable premium over rolling your own cheaper albeit less-elegant solution.
So you need to be sure your requirement is great enough to justify the running and transfer costs, particularly in consideration to the fact there’s no way to power the service down out of hours to save money as you would with EC2 – and that’s a grave disappointment. Hourly billing is also a tad regressive – we’re getting quite used to per-minute and per-second charging these days and this feels like an obvious improvement.
Have you used ATF for anything? If so I’d be interested in your comments below!