Securing Cloud Infrastructure Services (3 -> 3 -> 3 design pattern)
Adoption of Cloud Infra services, moving workloads to AWS, Azure or any other Cloud Infra Services is ubiquitous and inevitable. Organizations are either in the process of moving to Cloud, already moved to Cloud or planning to move to Cloud.
Infra services over Cloud not only constitutes of services from AWS/Azure/Other Infra providers, rather it also entails devOps tools/processes viz. like Chef, Puppet, GitHub, Jenkins and Orchestration tools viz. Ansible, Swarm, Kubernetes, Mesos etc.
The translation is simple – Securing Cloud Infra services means more than just securing AWS/Azure/Other cloud infra providers services. Securing the overall ecosystem requires much more breadth rather defining point solutions.
Based on several of Saviynt’s Cloud Infra services security implementations, this blog post aims to explain the approach, our learnings and our evolved design pattern.
Principle 1- “Silo” approach of securing just the Cloud services is not adequate and not the right way to go
Principle 2- The key aspect of the design pattern is its leading nature (its 3 ->3 -> 3 and not 3-3-3). This simply means starting with the 1st element of design pattern should and would lead to the last.
Now let’s take a deep – dive and understand this design pattern (3->3->3)
The First “3” of the Design Pattern symbolizes the “number of security boundaries” which needs to be integrated with Security Products/Tools
1. Integrate with 3 different security boundaries
a. Cloud services (AWS/Azure etc.) – Securing Cloud services starts with collecting the metadata about the infra objects, creating the raw data baseline and this would further be combined with other sets of information to define the vulnerabilities existing in the cloud ecosystem. Integration with Cloud services means retrieving the metadata of various Cloud Infra objects by the means of their APIs.
For ex. AWS Objects like EC2, S3, RDS etc. or Docker Containers or Azure Objects viz. VMs, Security Groups etc. expose their metadata via APIs such as ec2 virtualization type, docker’s network metadata, or azure security group inbound rules can be collected.
b. devOps tools (Chef/Puppet, Jenkins, GitHub etc.) – “Infrastructure as code”, “Immutable Infrastructure” “Phoenix Servers” or “Drift Management” concepts are on the rise and being widely adopted by organizations moving over to cloud. CI/CD , devOps process and tools are helping organizations to realize these concepts. With infrastructure being represented as “code templates” and “not as physical entities” it becomes imperative to integrate with the CI/CD and devOps systems to secure cloud infrastructure. APIs based integration with the devOps tools is the next set of integration among the 3 security boundaries.
For ex. The production infrastructure for a mission critical service/application running on AWS can be represented as a “CloudFormation template” and could be stored in GitHub. Unauthorized access or tampering with this template can result in bringing down the entire production infrastructure without even trying to make an attempt to breach AWS. Similarly, the configurations for the base image could be a Chef cookbook and unauthorized access to it should be prevented at all costs.
c. Enterprise Systems (HRMS etc.) – Identities cut across Enterprise and Cloud and thereby securing Identity across both these security boundaries is critical. Lack of governance around identity can very easily lead to residual access in Cloud systems and thereby correlating identities across these boundaries is a must.
For ex. A user could be terminated in the HR system and if its identity not correlated to its account in AWS can very well have access to AWS APIs/AWS Console leading to catastrophic results.
The second “3” of the design pattern represents the “number of information/metadata sets” which needs to be retrieved from the “3 Security Boundaries” as identified above
2. Retrieve/Pull 3 sets of information/metadata sets from each of the 3 security boundaries.
a. Configuration Information (Network Config, Security Config, Containers Config etc.) – Configurations information are extremely important piece to collect from the various security boundaries and these are not just limited to finding the ports configurations, route table entries, VPC (Virtual Private Cloud) configurations.
For ex. Finding the configurations of Chef/Puppet servers and if these servers are configured with access from unknown jump boxes, or S3 Buckets with production code templates having anonymous access from internet are equally important security configurations to retrieve
b. Access Information (Permissions, Rights, Roles, Policies, Privileges etc.) – Access information on various cloud infra objects are provided via roles, JSON based policies, ACLs etc. (If you need to understand more on this read my previous blog on AWS Privileged Access) Retrieving the fine level of access information from all the various security boundaries is extremely critical to determine unauthorized, privileged, inherited or explicit access on various infra components.
For ex: A user could very well have read-only access on Azure Virtual machine on the Azure Portal, but can have full control on the chef cookbook which creates those Azure Virtual machine, leading to an overall elevated access on the Infrastructure.
c. Usage Information (Activity Logs, Network Logs, Container Logs etc.) – Usage information in form of activity logs, network logs, infra setup logs is crucial to collect, crunch and sift to determine the kind of security controls which needs to be enforced to the various security boundaries. Key challenge with logs is the sheer volume and number of data records which could easily range in terabytes/petabytes. Crunching this volume of logs requires adoption of big data technologies like Apache Spark, Hadoop or Elasticsearch/Kibana based implementations
For ex. A small AWS account with 20-25 workloads, and 5-8 IAM users could generate CloudTrail logs ranging anywhere from 1 million to 5 million records in a week.
The last “3” of the design pattern represents the answers to three key questions which needs to be deduced once the “integration” and “retrieval/pull” has been completed.
These 3 questions and the answers to these form the foundation of “Securing Cloud Services Ecosystem”. The more detailed the answers to these questions are, the better the chances of an organization to gain visibility, understand the “security-state” of their cloud services ecosystem and enforce the necessary “security controls” across their “Cloud Services Ecosystem”.
3. Derive/Deduce the answers to following 3 questions in form of mapped risk controls/signatures
a. “Who” has “Access to What” – Pulling “access” information in form of fine grained policies, roles, role assignments help in determining the number of high privileged roles, users, groups or policies exist within the security boundaries. This can further be subjected to review/certification processes which could be instrumental in removing “extra” and “unnecessary” access.
For ex. Developers having full access to start and terminate instances, developers having full control on Chef Dev and Prod Environments, Tester Group having owner role on Chef production environment or developer group having explicit access on a SharePoint VM
b. “What is being done with the access?” – Collecting activity (users as well as network) information is critical and is imperative to assign the necessary security controls within a Cloud ecosystem. These logs could be analyzed, fed to machine-learning algorithms and thereby could be useful in determining the baseline user behavior or network behavior patterns. Further deviations from these baseline patterns could help organizations to determine event rarity detections, perform outlier analysis etc.
For ex: Identifying a vulnerable workload because of its ports or other network configuration is extremely important, however, users or CF templates, code based infra templates, due to which such workloads are being generated is also important to know. Tracking IP addresses or TOR browser based console sign-ons are important to track for organizations to enforce preventive mechanisms against it.
c. “What does that access secure?” – Understanding the scope of an access on Infra objects is extremely important to know. This helps security administrators to define well scoped access rules and policy statements and further leads in prevention of “unwanted” and “unauthorized” access objects to be created /assigned. Scope could be termed as “User based”, “Infra Resource based” or even “actions based”
For ex. Assigning an Administrator role to a user on Azure Resource Group ensures that all the underlying Azure objects within the resource group would also be attached to the user with full control permissions. Resource based policies viz. S3 Bucket policies ensure that the access on the S3 bucket remains same irrespective of the users accessing it.