Keeping AWS secrets secret with ECS

3,308
Remind
Remind is a messaging app for teachers, students, and parents to safely and easily communicate with each other. With more than 30 million users, we are one of the fastest-growing companies in edtech. We're hiring team members inspired by technology's potential to transform education, energized about solving the communication challenges that teachers face every day, and passionate about our vision of connecting every teacher, student, and parent in the world to improve education.

By Eric Holmes, Engineer at Remind.


One of the most important aspects of our security strategy in the Operations Engineering Team is to mitigate the risk of leaked AWS credentials. Even if you follow AWS best practices of putting your infrastructure within a VPC, leaking AWS credentials provides keys to the castle.

This post describes the strategy that we take to reduce the probability of AWS credentials being leaked, as well as reducing the risk in the event that they are leaked.

Background

In the beginning, there were long lived credentials.

The most straight forward way to give an application access to make requests to AWS APIs is to create an IAM user, generate an access key, and then pass the access key id and secret to the application. However, this has a number of problems:

  1. Trust. How do you pass these values to an application securely, so that they can't be accessed by unauthorized parties, and aren't stored in plaintext? In ECS, it would be bad practice to include AWS access keys as plaintext environment variables in task definitions. Any application or service that has the ecs:DescribeTaskDefinition permission will have access to all secrets.
  2. Credential Lifetime: IAM access keys don't automatically expire. The longer an access key lives, the higher the probability is that it has been accidentally, or maliciously leaked. Accidental leaks from human error are a surprisingly common source of incidents.

Instance Profiles

An alternative method is to use EC2 Instance Profiles. Instance profiles allow you to attach an IAM role to an EC2 instance, and applications running on the host can access an “instance metadata” endpoint to obtain temporary AWS credentials. This solves both trust (The EC2 instance authenticates itself with AWS) and credential expiration (credentials obtained from instance metadata only last for 1 hour, greatly minimizing the impact of a leak).

However, in the context of ECS, instance profiles have their own set of problems:

  1. Granularity: In ECS, you'll likely have many different services and applications running on the same host. To use credentials from an instance profile would require that we give the host a combination of all required permissions needed for each service, instead of giving each service it's own granular permissions. This would be a strict violation of the principle of least privilege.
  2. Unprotected Instance Metadata: It may not initially seem obvious, but instance metadata can be an incredibly dangerous feature. All it takes is an arbitrary GET request to http://169.254.169.254/ to obtain AWS credentials. What happens when software running on the host introduces a feature that downloads files from a URL? This potential exploit is best described in http://www.daemonology.net/blog/2016-10-09-EC2s-most-dangerous-feature.html.

Enter ECS Task Roles

ECS introduced Task Roles, which are similar to Instance Profiles, allowing you to attach an IAM role to an ECS task. This seems to solve all of our problems:

  1. Granularity: ECS tasks only get the exact permissions they need.
  2. Credential Lifetime: Credentials expire after 1 hour and are automatically rotated.
  3. Unprotected Instance Metadata: When obtaining credentials from ECS task roles, like instance profiles, the client within the container hits a metadata endpoint (http://169.254.169.254), which routes to the ECS agent running on the host. However, unlike instance metadata, the URL where credentials can be obtained is dynamically generated, and provided to the container via an environment variable. If an attacker gained access to make arbitrary GET requests, they would also need to know the value of the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable within the container.

At Remind, we use Stacker to manage all of our infrastructure, and then we run our services and applications with Empire. Through Stacker, we have a base “blueprint” for each Empire app which (among other things):

  1. Creates an IAM role that can be assumed by ECS.
  2. Creates a KMS key that the app can use to encrypt/decrypt SSM parameters (we may talk more about this in a future post).
  3. Passes the role to the Empire application.

Doing this ensures that we have a common starting point and convention for managing how all of our applications and services access AWS API's.

Combining the two

Not everything that we run on our ECS container instances gets run with ECS/Docker. We wanted to be able to continue using instance profiles for software running outside of Docker (generally infrastructure processes, like the Amazon SSM/ECS agents), but with the assurance that our Docker containers (user facing applications) would not be able to access userdata, or IAM credentials from the instance profile.

To do this, any requests from Docker containers going to the instance metadata endpoint get re-routed to an nginx proxy on host. This proxy denies the container access to the instance metadata endpoints for userdata and IAM credentials:

daemon off;
error_log stderr;
events {
  worker_connections 1024;
}

http {
  log_format request 'method=$request_method uri="$request_uri" host=$host ua="$http_user_agent" remote=$remote_addr status=$status';
  access_log /dev/stdout request;

  server {
    listen 127.0.0.1:7823;

    # Disallow access to credentials obtained from the instance profile.
    location ~ ^\/.*\/meta-data\/iam\/security-credentials\/.* {
      return 403;
    }

    # Disallow access to user data. In general, we shouldn't be putting secrets
    # in user data, but just in case...
    location ~ ^\/.*\/user-data.* {
      return 403;
    }

    # Proxy all other request through to instance metadata.
    location / {
      proxy_pass http://169.254.169.254:80;
    }
  }
}

# vi: ft=nginx
$ iptables -t nat -I PREROUTING -p tcp -d 169.254.169.254 --dport 80 \
  -j DNAT --to-destination 127.0.0.1:7823

This has the added benefit that any requests to instance metadata initiated from a Docker container gets logged and forwarded to our log aggregation service.

With the above in place, if an application or service that we run with ECS were to introduce an exploit that allowed an attacker to make arbitrary GET requests, we can be more confident that the attacker won't be able to obtain AWS credentials.

Remind
Remind is a messaging app for teachers, students, and parents to safely and easily communicate with each other. With more than 30 million users, we are one of the fastest-growing companies in edtech. We're hiring team members inspired by technology's potential to transform education, energized about solving the communication challenges that teachers face every day, and passionate about our vision of connecting every teacher, student, and parent in the world to improve education.
Tools mentioned in article
Open jobs at Remind
Site Reliability Engineer
San Francisco / Las Vegas /

About the company

Remind, the leading communication platform in education, helps educators reach students and parents where they are: their phones. With nearly 30 million active users, we’re one of the fastest-growing companies in education technology, but we have our sights set on something bigger: giving every student the opportunity to succeed.

Remind runs one of the largest free services in education, one of the fastest growing school/district SaaS businesses (the Remind Plan), and a revolutionary new direct to consumer business (Remind Coaching). The common thread through each of our product lines is the belief that success in education is driven by relationships, and the Remind communication platform is where the next generation of education relationships live.

About this role

The Remind Engineering Team collaborates to deliver features for our users and customers while setting and maintaining SLAs to ensure reliable system performance. We prefer strongly typed languages over dynamic for critical business systems, and leverage both relational and non-relational data structures as needed, supporting tens of thousands of requests per second. We bias towards using the right tool for the job, including Typescript, Python, Go, Ruby, Twirp, GraphQL, and many AWS services (Aurora, Lambda, DynamoDB, SQS, Kinesis).

As a Site Reliability Engineer at Remind, you'll collaborate with our product engineering teams, as well as cross-functional teams, to maximize site availability, performance, and uptime, and build systems and features to enable engineers to ship more quickly and more confidently.

Not in San Francisco? No problem! Our team is distributed within +/-3 hours of Pacific Time.

About you:

  • You have consistently shipped high quality code to production as part of a team
  • You collaborate effectively with engineers and product managers to build systems to increase the leverage of our product engineering teams
  • You write clean code and have significant experience with one or more programming languages
  • You understand the value of an appropriately defined SLA for both internal and external systems and services, and have experience building highly available systems and services which scale and perform in accordance with such an SLA
  • Others enjoy working with you because of your positive attitude and technical competence

What you'll do:

  • Increase the overall availability and performance of our distributed services
  • Support uptime through participation in our eng-wide on-call rotation
  • Help establish, conform to, and audit our SLAs so that the performance of our website exceeds the expectations of students, parents, and educators in even our largest and most demanding school districts
  • Use technologies such as Packer+Ansible, stacker, CloudFormation, Docker, ECS, and Lambda to maintain and improve our foundational infrastructure
  • Improve the deployment process to make it fast and predictable as possible
  • With product engineering teams, debug production issues across services and levels of the stack
  • Partner with product engineering teams to plan the growth of Remind’s infrastructure

Compensation:

  • Competitive salary and equity
  • 401K
  • 100% health coverage for you and your dependents
  • Open vacation policy
  • Paid parental leave

Remind is an equal opportunity employer, and we're committed to diversity and inclusion in the workplace. We aim to represent the students, teachers, and parents we serve, and we welcome, support, and empower all the diverse individuals in our community.

Engineering Manager
San Francisco / Las Vegas /

About the company

Remind, the leading communication platform in education, helps educators reach students and parents where they are: their phones. With nearly 30 million active users, we’re one of the fastest-growing companies in education technology, but we have our sights set on something bigger: giving every student the opportunity to succeed.

Remind runs one of the largest free services in education, one of the fastest growing school/district SaaS businesses (the Remind Plan), and a new direct to consumer business (Remind Coaching). The common thread through each of our product lines is the belief that success in education is driven by relationships, and the Remind communication platform is where the next generation of education relationships live.

About this role

The Remind Engineering Team builds, operates, scales, and secures the Remind platform for the millions of teachers, students, parents, and administrators who rely on us every day. In addition to native clients on iOS, Android, and web, our platform runs at scale on Typescript, Go, Ruby, Twirp, GraphQL, and AWS services (Aurora, Lambda, DynamoDB, SQS).

As Engineering Manager, you will partner with product and engineering leadership to ensure we deliver against the technical product roadmap of the Remind Platform in service of millions of parents, teachers, students, and administrators. You will ensure the consistent, rapid, and iterative delivery of your team, and ensure that their work conforms to all SLAs, SLOs, and KPIs set by engineering. You will own hiring for your team and build a diverse, equitable, and inclusive team culture. You will manage the careers of each of your direct reports, and ensure they are happy, challenged, and growing in both technical breadth and depth, as well as growing as humans!

Not in San Francisco? No problem! Our team is fully distributed within +/-3 hours of Pacific Time.

About you:

  • Others enjoy working with you because of your positive attitude and technical competence
  • You collaborate effectively with engineers, product managers, and executives to break down product requirements and maintain visibility on projects
  • You are adept with relational and non-relational thinking, have built large-scale systems, and are comfortable with most aspects of computation from bare metal to networks to fluency in multiple languages to distributed systems
  • You excel at hiring, retaining, inspiring teams, and managing and driving execution on a delivery team

What you'll do:

  • Manage and drive sprint execution and delivery against our product roadmap
  • Manage delivery teams in support of product roadmap delivery
  • Ensure that Remind exceeds our SLAs, SLOs, and KPIs. Drive and define new measures as needed
  • Ensure that Remind exceeds the demanding expectations of our ISO 27001 certification as we strive to protect our users and customers as they focus on learning
  • With our Head of Technology, set technical direction in support of our platform requirements
  • Hire diverse talent to extend and grow our team and culture in an equitable, inclusive way
  • Ensure security and budget conformance throughout the SDLC

Compensation:

  • Competitive salary and equity
  • 401K
  • 100% health coverage for you and your dependents
  • Open vacation policy
  • Paid parental leave

Remind is an equal opportunity employer, and we're committed to diversity and inclusion in the workplace. We aim to represent the students, teachers, and parents we serve, and we welcome, support, and empower all the diverse individuals in our community.

Software Engineer - Backend
San Francisco / Las Vegas /

About the company

Remind, the leading communication platform in education, helps educators reach students and parents where they are: their phones. With nearly 30 million active users, we’re one of the fastest-growing companies in education technology, but we have our sights set on something bigger: giving every student the opportunity to succeed.

Remind runs one of the largest free services in education, one of the fastest growing school/district SaaS businesses (the Remind Plan), and a new direct to consumer business (Remind Coaching). The common thread through each of our product lines is the belief that success in education is driven by relationships, and the Remind communication platform is where the next generation of education relationships live.

About this role

The Remind Engineering Team collaborates to deliver features for our users and customers while setting and maintaining SLAs to ensure reliable system performance. We prefer strongly typed languages over dynamic, and leverage both relational and non-relational data structures as needed, supporting tens of thousands of requests per second. We bias towards using the right tool for the job, including Typescript, Go, Ruby, Twirp, GraphQL, and many AWS services (Aurora, Lambda, DynamoDB, SQS).

As a Backend Software Engineer at Remind, you'll collaborate with other Backend and Fullstack Software Engineers, as well as cross-functional teams, to maximize site availability, performance, and uptime, as well as helping develop new features for students, parents, teachers and administrators.

Not in San Francisco? No problem! Our team is distributed within +/-3 hours of Pacific Time.

About you:

  • You have over 5+ year of software development experience and consistently shipped high quality code to production as part of a team
  • You collaborate effectively with engineers, product managers, and designers to break down product requirements and maintain visibility on projects
  • You write clean, scalable code and have significant experience with one or more programming languages
  • You are well-versed in distributed systems, deeply understand the tradeoff between consistency and availability, and have built robust systems supporting at least 10,000 requests/second
  • You have built scalable, performant, highly available services and understand the value of a good SLA
  • Others enjoy working with you because of your positive attitude and technical competence

What you'll do:

  • Design and implement fault-tolerant, self-healing systems to improve our communications platform supporting nearly 30 million students, parents, and educators
  • Increase the overall availability and performance of our distributed services
  • Ensure our system can elastically scale through an order of magnitude of traffic
  • Support uptime through participation in our eng-wide on-call rotation
  • Help establish, conform to, and tighten our SLAs so that the performance of our website exceeds the expectations of even our largest and most demanding customers

Compensation:

  • Competitive salary and equity
  • 401K
  • 100% health coverage for you and your dependents
  • Open vacation policy
  • Paid parental leave

Remind is an equal opportunity employer, and we're committed to diversity and inclusion in the workplace. We aim to represent the students, teachers, and parents we serve, and we welcome, support, and empower all the diverse individuals in our community.

Software Engineer - Fullstack
San Francisco / Las Vegas /

About the company

Remind, the leading communication platform in education, helps educators reach students and parents where they are: their phones. With nearly 30 million active users, we’re one of the fastest-growing companies in education technology, but we have our sights set on something  bigger: giving every student the opportunity to succeed.

Remind runs one of the largest free services in education, one of the fastest growing school/district SaaS businesses (the Remind Plan), and a new direct to consumer business (Remind Coaching). The common thread through each of our product lines is the belief that success in education is driven by relationships, and the Remind communication platform is where the next generation of education relationships live.

About this role

The Remind Engineering Team collaborates to deliver features for our users and customers while setting and maintaining SLAs to ensure reliable system performance. We prefer strongly typed languages over dynamic, and leverage both relational and non-relational data structures as needed, supporting tens of thousands of requests per second. We bias towards using the right tool for the job, including Typescript, Go, Ruby, Twirp, GraphQL, and many AWS services (Aurora, Lambda, DynamoDB, SQS).

As a Fullstack Software Engineer at Remind, you'll collaborate with other Backend and Fullstack Software Engineers, as well as cross-functional teams, to maximize site availability, performance, and uptime, as well as helping develop new features for students, parents, teachers and administrators.

Not in San Francisco? No problem! Our team is distributed within +/-3 hours of Pacific Time.

About you:

  • You have over 5 years of experience in developing software and consistently shipped high quality code to production as part of a team
  • You enjoy tackling interesting, complicated problems and following them through to the end
  • You collaborate effectively with engineers, product managers, and designers to break down product requirements and maintain visibility on projects
  • You write clean code and have significant experience with one or more programming languages
  • You have built scalable, performant, highly available services and understand the value of a good SLA
  • You have built single page web applications with an emphasis on maintainability and user experience
  • Others enjoy working with you because of your positive attitude and technical competence

What you'll do:

  • Design and implement backend systems to improve our communications platform supporting over 30 million students, parents, and educators
  • Increase the fault tolerance of our distributed services
  • Ensure our system can scale to handle several times current load
  • Deliver features that delight the millions of parents, teachers, and students that use Remind every day
  • Support uptime through participation in our eng-wide on-call rotation
  • Help tighten our SLAs so that the performance of our website exceeds the expectations of even our largest and most demanding customers

Compensation:

  • Competitive salary and equity
  • 401K
  • 100% health coverage for you and your dependents
  • Open vacation policy
  • Paid parental leave

Remind is an equal opportunity employer, and we're committed to diversity and inclusion in the workplace. We aim to represent the students, teachers, and parents we serve, and we welcome, support, and empower all the diverse individuals in our community.

Verified by
Head of Technology
Product Architect
You may also like