IAM Abuse is a real deal, and cases like “Tesla Hackers Hijacked Amazon Cloud Account to Mine Cryptocurrency” can be life-threatening for every company using a cloud provider. Taking care of this aspect of cloud account management can be the difference between successfully managing your cloud resources and losing them all.
As a DevOps Engineer, I personally began learning about AWS Identity and Access Management (IAM) solutions as my first step within the AWS space in general. I’ve been hearing about the pain of configuring IAM the right way for years now.
What's causing this pain?
That's a bit tricky. There are two sides to the IAM “coin”:
On the one hand, looser IAM policies make it easier for developers to do their jobs, but they also pose a serious security risk. On the other hand, while solving the security risks, overly strict IAM rules will make things difficult for developers on a daily basis.
A careful balance needs to be struck between enabling a good experience for the development team and minimizing security risks. This fine balance is unique for every company, and finding it and adjusting it over time is the key.
In this post, I aim to help you better understand this balance and set down a few best practices for achieving it. To do so, I’ll include some core scenarios of using Roles, monitoring role usage, how to use OIDC (OpenID Connect) with your Kubernetes or Github, as well as a few other tips.
Before we dive in, let’s go over some of the basics of managing IAM on AWS.
AWS IAM: A basic introduction
With AWS’s Identity and Access Management, we, the developers (and users), can decide what resources a user, a group, or a role can view or change. In other words: Who can perform what Action on Which Resource? We'll start with talking about the smallest building block of the IAM - AWS Policies:
Policy JSON documents are comprised of the following elements:
Effect - Allow or Deny access to the resource is decided by Effect (Allow/Deny)
Action - A set of service-specific parameters (like “iam: CreateUser”).
Resource - Resource names (like “arn:aws:s3:::conf-* “)
Condition (Optional) - Grant conditions (like “aws: RequestedRegion”: “ap-south-1”)
Here’s an example of a JSON policy document:
Note: This policy allows changing record sets for all Route53 zones; however, “create” permissions do not entitle the user to “read” or “list,” so I added the ability to list all hosted zones and record sets to the policy.
Each policy can be attached to a user, a group, or a role.
Policies attached to users and groups are quite basic and easy to understand - each AWS user or a group can have many policies (It's also helpful to note that policies are reusable!), while Groups are simply a bunch of users that will share the same policies.
The same goes for roles: A role can have many policies attached to it. What makes roles a bit trickier is that unlike users and groups, which can be accessed via basic authentication or tokens, roles only work with a role assumption mechanism.
AWS IAM: 5 Best Practices
As mentioned before, achieving a balance between flexibility and security when managing your IAM solution is crucial in keeping our application secure while enabling infrastructure for our engineers. This can, however, be quite challenging.
Let’s dig in with some best practices for leveraging the power of IAM and Roles in particular.
1. Assuming Roles - Avoiding Premature Privilege Escalations
Roles are a critical part of AWS IAM because they allow us to manage permissions between services and resources and automate actions in AWS.
Roles basically function as a policy administration layer, as many policies can be attached to each role, and roles are assumed by various entities. This extra layer allows us to control and manage our IAM much more efficiently using the tools which AWS IAM provides us:
"Trust Relationship” policies allow us to secure who can use the role, CloudTrail allows us to monitor who’s assuming which role, and OIDC lets us allow 3rd party services delegate AWS permissions. In the next following segments, we will dive into how to use these tools to enhance our organization’s IAM rules and make sure they are secure and flexible.
In the absence of roles as an extra administrative layer for managing policies, premature privilege and permission escalations are likely to occur. These can manifest as unmanaged policies and users, unmanaged 3rd party services policies, and many more potentially harmful administrative gaps, resulting in serious security breaches for your organization.
When talking about assuming roles, we will be using the AWS Security Token Service (STS), which is a web service that enables you to request temporary limited-privilege credentials for AWS IAM.
By using the AWS STS, we can invoke the AssumeRole action that returns a set of temporary security credentials that may be used to access AWS resources you would not normally have access to. To simplify that, we can say that this action places you ‘in the guise’ of the role you choose to assume. So if I choose to "assume" my new "ExampleRole," the policies contained in "ExampleRole" will simply be delegated to me.
In order to use the role you created, you must use the STS API to "assume" it.
Fortunately, most of the 3rd party services like Github, K8s, Terraform, and many more offer an easy way to assume roles, thus making the usage of roles much more flexible and convenient, which is essential when talking about developer experience.
2. Using Roles in Real-Life Scenarios
When roles should be used is a tricky question. We can better demonstrate the usage of roles by explaining and understanding Why roles should be used in the first place. There are several explanations to why we should use roles - we’ll divide them into 4 parts that will also include examples of using roles in the real world.
Easier Policy Management, Auditing, and Tracking
Let’s say, for example, that we need to share the ability to delete EC2 instances. We can easily create a proper policy and attach it to a user, or even better, to a group. In this made-up scenario, we'll suppose the demand for deleting EC2 instances is very common.
The thing is, deleting EC2 instances is a very sensitive permission to have. We can’t just give it to any user. So how can we solve this problem?
We can create a role called "EC2DeleteRole" (attached with the policy of deleting EC2 instances), and make sure that having this role is the only way you can get this permission.
We, the IAM administrators, will be able to track and manage our identities' permissions much more effectively by assigning this permission to a specific role. Imagine a situation where we want to remove the EC2 instance ‘delete policy’ from every identity on our AWS account. Rather than going through every identity and deleting this policy, we can simply do it by deleting the “EC2DeleteRole” we created earlier.
Roles are basically a toolbox for managing policies in a very flexible way, so don’t be afraid to use them!
De-Risking Automations and 3rd party services
Occasionally, we will need to delegate permissions to third-party services or automation to operate on AWS. There are high risks associated with this (Such as supply-chain attacks), which is why we must ensure that all the components of our infrastructure are highly secure so that an attacker cannot gain access to our infrastructure through a weak or unsecured component.
When it comes to delegating permissions for automation or third-party services, the only available options are providing authentication tokens and passwords per dedicated user or using Roles.
The option of sharing authentication tokens and passwords is the easiest to implement; just throw in your password as a variable, and that’s it. Sadly this can expose the organization to a lot of maintenance issues and security risks, as passwords can be accidentally shared or accessed by an unwanted threat.
This is where the thin line between flexible and secured IAM is crossed, and we must be sure that both are well implemented without compromise.
To achieve this, the best option is to use roles!
We can easily provide our 3rd party automation or service a role ARN to assume in order to get the permissions it needs. The same goes for managing permissions for infrastructure-as-code tools such as Terraform, Pulumi, and Terragrunt and even more robust systems such as Kubernetes or Hashicorp Nomad.
This can be achieved using the OpenID provider, which we will discuss further.
Delegating permissions between accounts -
Say our organization has several AWS accounts - one account for each SDLC environment (Development, staging, and production, for example), and for some reason, we are asked to give a service that is located in the staging account, the ability to reach an S3 bucket on our development AWS account.
Roles enable us to accomplish that, as roles can be assumed between accounts.
When creating a new role, we can choose that the “trusted entity” of this role will be a different AWS account; this way, the role will be assumable by a different account.
There is only one other option, which is to create a dedicated user and pass the password as a raw variable, which is highly insecure.
Service Roles -
Many AWS services require you to use roles in order to control access. A role that a service assumes to perform actions is called a Service Role. When a role serves a specialized purpose for a service, it can be categorized as a service role for EC2 instances or a service-linked role.
It’s useful to clearly distinguish between internal and external services, as each one of them will require a different set of roles. For example, AWS core services such as Amazon Data Lifecycle Manager will require a specific set of policies in order to operate properly, so we would keep this “Service Role” static as we don’t want to break the service’s operations.
On the contrary, managing roles for external services such as Kubernetes will require much more maintenance and monitoring from our side, as these kinds of roles are much more dynamic in their needs, which may lead to security issues and potential human error.
Consider marking the external services differently than the internal ones in your cloud architectural design, then design each of them separately. This will help you better understand which roles require more attention, as external service roles may be unsafe for our organization in some cases.
3. Monitoring role usage using AWS CloudTrail
When a large number of roles accumulates, it can be difficult to manage them effectively. In this case, AWS CloudTrail can come in handy.
Imagine a situation where a user gets access to a role that you don’t even remember exists or a scenario when multiple roles have the same policies. IAM and effective permissions on AWS should not be managed that way.
In order to avoid such scenarios (and worse ones), we can use AWS CloudTrail.
With this great AWS service, we can monitor and capture the activity of our organization’s users and the API usage across AWS regions and accounts on a single centralized interface.
AWS CloudTrail gives a lot of information about everything related to auditing the cloud activity and monitoring this data, and with that, we can watch for any unwanted activity of role assumptions and much more.
CloudTrail could be useful when monitoring who is reading secrets from our AWS Secret Manager. There’s no doubt that reading secrets should be a well-monitored action on our account since we want to ensure that only very specific users and groups can read secrets.
We can monitor who’s reading which secret, which role the identity is assuming, its IP, the event time, and much more.
4. Restrict role assumption with Trust Relationship policies
Should all roles be available to everyone? Clearly not. Roles are important entities that should always be kept safe from potential threats. Every role has a special policy called a “trust relationship policy”. This policy looks very similar to any other, and we can specify who can assume the role using it.
In the above example, we attached a trust-relationship policy to our role that specifies that only the root user of the "111122223333" AWS account can assume this role, as indicated in the Action statement: "sts:AssumeRole".
Trust Relationship policies are like whitelists or blacklists for roles. Imagine a situation where we create a role that has the permission to delete EC2, and everyone in the organization can assume this role. Obviously, this is not a good idea, so we must specify which entities are allowed to use this role and which ones aren’t.
5. Streamlining permissions with 3rd party entities: OpenID Connect
What is OpenID Connect (OIDC)?
OIDC is an open authentication protocol that profiles and extends OAuth 2.0 to add an identity layer. It allows clients to confirm an end user’s identity using authentication by an identity management server.
The OIDC standard is used by many well-known services to delegate access and authentication securely. Luckily for us, AWS IAM supports OIDC as a valid identity provider.
As we mentioned previously, flexibility is extremely important when talking about IAM, as we want to ensure that our engineers are able to have a frictionless working experience with the infrastructure. Fortunately, using OIDC as an identity provider for AWS makes the authentication for AWS much more flexible and secure, as it allows us to integrate our 3rd party services with AWS IAM seamlessly.
In my day-to-day usage, I use the OIDC identity provider for Github Actions. This gives Github Actions the ability to interact with AWS without any passwords or tokens.
Using OIDC with Kubernetes is also great, as it helps integrate K8s’ service accounts with AWS IAM. For example, using OIDC allows us to give our Kubernetes service accounts the ability to assume roles from AWS.
My favorite live example is configuring ExternalDNS in EKS (Elastic Kubernetes Service powered by AWS) to work with our Route53 DNS records using OIDC integration. In short, ExternalDNS synchronizes exposed Kubernetes Services and Ingresses with DNS providers.
AWS provides the OpenID Connect Provider Issuer out-of-the-box when creating an EKS cluster, so in order to integrate our Kubernetes cluster with the OIDC, we just need to create the OIDC provider, and the following resources:
1. Kubernetes Service Account
2. AWS Role with the following trust relationship
You can check out more information on Creating an IAM role and policy for your Kubernetes service account; read here.
Not using OIDC may require engineers to use raw passwords and tokens, which are extremely insecure and inflexible, and mostly add a lot of friction to the process.
Good IAM is a must
Managing IAM for our organization can be a big deal; as we showed above, on the one hand, loose IAM can be a real danger to the whole company and can expose the company to many risks, but on the other hand, IAM policies that are too strict can make our engineer’s lives much harder.
Following the mentioned best practices, such as assuming roles, using AWS CloudTrail to monitor users and API activity, restricting role assumptions using trust relationship policies, and using OpenID Connect as a way to integrate your 3rd parties to AWS, you can make your IAM administration and operations much easier, flexible, and secure. In this age of cloud computing, that’s a must!
Want to discuss more policy permission models? - hit me up on Permit’s Slack community.