UNCLASSIFIED - NO CUI

Skip to content
Snippets Groups Projects
Commit 40b7330c authored by joshwolf's avatar joshwolf
Browse files

update pre-requisites with more distro agnostic requirements, update rke2 iac...

update pre-requisites with more distro agnostic requirements, update rke2 iac to use stig'd rhel8 ami airgapped  deployment by default
parent 035884aa
No related branches found
No related tags found
1 merge request!343iac updates
......@@ -158,14 +158,14 @@ upgrade:
stage: network up
rules:
# Run on scheduled jobs OR when `test-ci` label is assigned
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ "test-ci::infra"'
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ /(^|,)test-ci::infra(,|$)/'
allow_failure: false
# Abstract for jobs responsible for creating infrastructure
.infra create:
rules:
# Run on scheduled jobs OR when `test-ci` label is assigned
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ "test-ci::infra"'
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ /(^|,)test-ci::infra(,|$)/'
# skip job when branch name starts with "hotfix" or "patch"
- if: '$CI_MERGE_REQUEST_SOURCE_BRANCH_NAME =~ /^(hotfix|patch)/'
when: never
......@@ -174,7 +174,7 @@ upgrade:
.infra cleanup:
rules:
# Run on scheduled jobs
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ "test-ci::infra"'
- if: '($CI_PIPELINE_SOURCE == "schedule" && $CI_COMMIT_BRANCH == "master") || $CI_MERGE_REQUEST_LABELS =~ /(^|,)test-ci::infra(,|$)/'
allow_failure: true
when: always
......@@ -238,6 +238,14 @@ aws/rke2/bigbang up:
- cp ${CI_PROJECT_DIR}/rke2.yaml ~/.kube/config
# Deploy a default storage class for aws
- kubectl apply -f ${CI_PROJECT_DIR}/.gitlab-ci/jobs/rke2/dependencies/k8s-resources/aws/default-ebs-sc.yaml
- echo "Patching default rke2 PSPs to be less restrictive so OPA Gatekeeper can successfully deploy"
- |
kubectl --kubeconfig rke2.yaml patch psp global-unrestricted-psp -p '{"metadata": { "annotations": { "seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*" } } }'
- |
kubectl --kubeconfig rke2.yaml patch psp system-unrestricted-psp -p '{ "metadata": { "annotations": { "seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*" } } }'
- |
kubectl --kubeconfig rke2.yaml patch psp global-restricted-psp -p '{ "metadata": { "annotations": { "seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*" } } }'
script:
- *deploy_bigbang
environment:
......
# rke2
This folder contains _one example_ of deploying `rke2`, and is tuned specifically to run BigBang CI. While it can be used as an example for deployments, please ensure you're taking your own needs into consideration.
## What's deployed
* `rke2` cluster
* sized according to BigBang CI Needs as non-ha
* if ha is desired, simply change `servers = 3` in the installation or upgrade
* aws govcloud (`us-gov-west-1`)
* stig'd rhel8 (90-95% depending on user configuration)
* airgap
* single autoscaling generic agent nodepool
* sized according to BigBang CI needs as 2 `m5a.4xlarge` instances
* if additional nodes are needed, simply add more nodepools
## How's it deployed
The `rke2` terraform modules used can be found on repo1 [here](https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform).
Both `ci` and `dev` setups exist, the example below can be run locally for development workflows where local clusters may not suffice:
```bash
# ensure BigBang's CI network exists
cd .gitlab-ci/jobs/networking/aws/dependencies/terraform/env/dev
terraform init
terraform apply
# deploy rke2
cd .gitlab-ci/jobs/rke2/dependencies/terraform/env/dev
terraform init
terraform apply
# kubeconfig will be copied locally after terraform completes in ~5m
kubectl --kubeconfig rke2.yaml get no,all -A
```
\ No newline at end of file
# RKE2 Packer
An _extremely_ simple packer script to pre-load rke2 dependencies for airgapped deployment.
This packer script is __not__ intended to be used as a standard for airgapped rke2 deployments, it is simply a quick and dirty way to enable airgap deployments in the context of BigBang's CI.
## Future Work
This is currently baselined off of a vanilla RHEL8.3 ami, we should base this off of a P1 gold standard stig'd ami.
{
"variables": {
"aws_region": "us-gov-west-1",
"rke2_version": "v1.18.12+rke2r1",
"rke2_url": "https://github.com/rancher/rke2/releases/download",
"ami_name": "rhel8",
"ami_description": "An RKE2 base image based on RHEL 8 Build Date: {{ isotime }}",
"source_ami_name": "RHEL-8.3*",
"source_ami_owner": "309956199498",
"source_ami_owner_govcloud": "219670896067",
"source_ami_ssh_user": "ec2-user"
},
"builders": [
{
"type": "amazon-ebs",
"region": "{{user `aws_region`}}",
"ami_regions": "us-gov-west-1",
"source_ami_filter": {
"filters": {
"name": "{{user `source_ami_name`}}",
"root-device-type": "ebs",
"state": "available",
"virtualization-type": "hvm",
"architecture": "x86_64"
},
"owners": [ "{{user `source_ami_owner`}}", "{{user `source_ami_owner_govcloud`}}" ],
"most_recent": true
},
"instance_type": "m5.large",
"ssh_username": "{{user `source_ami_ssh_user`}}",
"subnet_id": "{{user `subnet_id`}}",
"kms_key_id": "{{user `kms_key_id`}}",
"launch_block_device_mappings": [
{
"device_name": "/dev/sda1",
"volume_size": 25,
"volume_type": "gp2",
"delete_on_termination": true
}
],
"tags": {
"Name": "rke2-{{user `ami_name`}}-{{ timestamp }}",
"BuildDate": "{{ isotime }}",
"RKE2-Version": "{{user `rke2_version`}}"
},
"ami_name": "rke2-{{user `ami_name`}}-{{ timestamp }}",
"ami_description": "{{user `ami_description` }}",
"ami_virtualization_type": "hvm",
"run_tags": {
"Name": "packer-builder-rke2-{{user `ami_name`}}-ami"
}
}
],
"provisioners": [
{
"type": "shell",
"environment_vars": [
"RKE2_VERSION={{ user `rke2_version` }}",
"RKE2_URL={{ user `rke2_url` }}"
],
"script": "./setup.sh",
"execute_command": "chmod +x {{ .Path }}; sudo {{ .Vars }} {{ .Path }}"
}
]
}
\ No newline at end of file
#!/bin/bash
set -o pipefail
set -o errexit
# Bare minimum dependency collection
yum install -y unzip
yum update -y
cd /usr/local/bin
# RKE2
curl -sL https://get.rke2.io -o rke2.sh
curl -OLs "${RKE2_URL}/${RKE2_VERSION}/{rke2.linux-amd64,rke2.linux-amd64.tar.gz,rke2-images.linux-amd64.txt,rke2-images.linux-amd64.tar.gz,sha256sum-amd64.txt}"
grep -v "e2e-*" sha256sum-amd64.txt | sha256sum -c /dev/stdin
if [ $? -ne 0 ]
then
echo "[ERROR] checksum of rke2 files don't match"
exit 1
fi
rm -f sha256sum-amd64.txt
chmod 755 rke2*
# Install rke2 components (with yum so selinux components are fetched)
INSTALL_RKE2_METHOD='yum' ./rke2.sh
INSTALL_RKE2_METHOD='yum' INSTALL_RKE2_TYPE="agent" ./rke2.sh
# Move and decompress images to pre-load dir
mkdir -p /var/lib/rancher/rke2/agent/images/ && zcat rke2-images.linux-amd64.tar.gz > /var/lib/rancher/rke2/agent/images/rke2-images.linux-amd64.tar
# AWS CLI
curl -sL https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip -o /tmp/awscliv2.zip && unzip -qq -d /tmp /tmp/awscliv2.zip && /tmp/aws/install --bin-dir /usr/bin
rm -rf /tmp/aws*
# WARN: This sets the default region to the current region that packer is building from
aws configure set default.region $(curl -s http://169.254.169.254/latest/meta-data/placement/region)
cat <<EOF >> /etc/environment
HISTTIMEFORMAT="%F %T "
KUBECONFIG=/etc/rancher/rke2/rke2.yaml
EOF
cat <<EOF >> /root/.bash_aliases
alias k='rke2 kubectl'
EOF
# Clean up build instance history
rm -rf \
/etc/hostname \
/home/ec2-user/.ssh/authorized_keys \
/root/.ssh/authorized_keys \
/var/lib/cloud/data \
/var/lib/cloud/instance \
/var/lib/cloud/instances \
/var/lib/cloud/sem \
/var/log/cloud-init-output.log \
/var/log/cloud-init.log \
/var/log/secure \
/var/log/wtmp \
/var/log/apt
> /etc/machine-id
> /var/log/wtmp
> /var/log/btmp
yum clean all -y
df -h; date
history -c
......@@ -26,8 +26,4 @@ module "ci" {
ci_pipeline_url = var.ci_pipeline_url
vpc_id = data.terraform_remote_state.networking.outputs.vpc_id
subnets = data.terraform_remote_state.networking.outputs.intra_subnets
download = false
server_ami = "ami-00aab2121681e4a31"
agent_ami = "ami-00aab2121681e4a31"
}
\ No newline at end of file
locals {
name = "umbrella-${var.env}"
# Bigbang specific OS tuning
os_prep = <<EOF
# Configure aws cli default region to current region, it'd be great if the aws cli did this on install........
aws configure set default.region $(curl -s http://169.254.169.254/latest/meta-data/placement/region)
# Tune vm sysctl for elasticsearch
sysctl -w vm.max_map_count=262144
# Preload kernel modules required by istio-init, required for selinux enforcing instances using istio-init
modprobe xt_REDIRECT
modprobe xt_owner
modprobe xt_statistic
# Persist modules after reboots
printf "xt_REDIRECT\nxt_owner\nxt_statistic\n" | sudo tee -a /etc/modules
EOF
tags = {
"project" = "umbrella"
"env" = var.env
......@@ -10,7 +26,7 @@ locals {
}
module "rke2" {
source = "git::https://github.com/rancherfederal/rke2-aws-tf.git"
source = "git::https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform.git?ref=v1.1.7"
cluster_name = local.name
vpc_id = var.vpc_id
......@@ -25,21 +41,13 @@ module "rke2" {
enable_ccm = var.enable_ccm
download = var.download
# TODO: These need to be set in pre-baked ami's
pre_userdata = <<-EOF
# Temporarily disable selinux enforcing due to missing policies in containerd
# The change is currently being upstreamed and can be tracked here: https://github.com/rancher/k3s/issues/2240
setenforce 0
# Tune vm sysctl for elasticsearch
sysctl -w vm.max_map_count=262144
EOF
pre_userdata = local.os_prep
tags = merge({}, local.tags, var.tags)
}
module "generic_agents" {
source = "git::https://github.com/rancherfederal/rke2-aws-tf.git//modules/agent-nodepool"
source = "git::https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform.git//modules/agent-nodepool?ref=v1.1.7"
name = "generic-agent"
vpc_id = var.vpc_id
......@@ -56,14 +64,7 @@ module "generic_agents" {
download = var.download
# TODO: These need to be set in pre-baked ami's
pre_userdata = <<-EOF
# Temporarily disable selinux enforcing due to missing policies in containerd
# The change is currently being upstreamed and can be tracked here: https://github.com/rancher/k3s/issues/2240
setenforce 0
# Tune vm sysct for elasticsearch
sysctl -w vm.max_map_count=262144
EOF
pre_userdata = local.os_prep
# Required data for identifying cluster to join
cluster_data = module.rke2.cluster_data
......
......@@ -39,8 +39,7 @@ variable "ssh_authorized_keys" {
variable "download" {
type = bool
default = true
# TODO: Probably makes the most sense to set this to false and just use the ami for everything
default = false
description = "Toggle dependency downloading"
}
......@@ -48,7 +47,8 @@ variable "download" {
# Server variables
#
variable "server_ami" {
default = "ami-57ecd436" # RHEL 8.3
# RHEL 8 RKE2 STIG: https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-image-builder
default = "ami-09d02b6cbe719f221"
}
variable "server_instance_type" {
default = "m5a.large"
......@@ -64,7 +64,8 @@ variable "rke2_version" {
# Generic agent variables
#
variable "agent_ami" {
default = "ami-57ecd436" # RHEL 8.3
# RHEL 8 RKE2 STIG: https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-image-builder
default = "ami-09d02b6cbe719f221"
}
variable "agent_instance_type" {
default = "m5a.4xlarge"
......
......@@ -4,14 +4,77 @@ BigBang is built to work on all the major kubernetes distributions. However, si
configured out the box with settings incompatible with BigBang, this document serves as a checklist of pre-requisites
for any distribution that may need it.
> Clusters are sorted _alphabetically_
## All Clusters
The following apply as prerequisites for all clusters
* A default `StorageClass` capable of resolving `ReadWriteOnce` `PersistentVolumeClaims` must exist
### Storage
BigBang assumes the cluster you're deploying to supports [dynamic volume provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/). Which ultimatley puts the burden on the cluster distro provider to ensure appropriate setup. In many cases, this is as simple as using the in-tree CSI drivers. Please refer to each supported distro's documentation for further details.
In the future, BigBang plans to support the provisioning and management of a cloud agnostic container attached storage solution, but until then, on-prem deployments require more involved setup, typically supported through the vendor.
#### Default `StorageClass`
A default `StorageClass` capable of resolving `ReadWriteOnce` `PersistentVolumeClaims` must exist. An example suitable for basic production workloads on aws that supports a highly available cluster on multiple availability zones is provided below:
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: WaitForFirstConsumer
```
It is up to the user to ensure the default storage class' performance is suitable for their workloads, or to specify different `StorageClasses` when necessary.
### `selinux`
Additional pre-requisites are needed for istio on systems with selinux set to `Enforcing`.
By default, BigBang will deploy istio configured to use `istio-init` (read more [here](https://istio.io/latest/docs/setup/additional-setup/cni/)). To ensure istio can properly initialize enovy sidecars without container privileged escalation permissions, several system kernel modules must be pre-loaded before installing BigBang:
```bash
modprobe xt_REDIRECT
modprobe xt_owner
modprobe xt_statistic
```
### Load Balancing
BigBang by default assumes the cluster you're deploying to supports dynamic load balancing provisioning. Specifically during the creation of istio and it's ingress gateways, which map to a "physical" load balancer usually provisioned by the cloud provider.
In almost all cases, the distro provides this capability through in-tree cloud providers appropriately configured through the IAC on repo1. For on-prem environments, please consult with the vendors support for the recommended way of handling automatic load balancing configuration.
If automatic load balancing provisioning is not support or not desired, the default BigBang configuration can be modified to expose istio's ingressgateway through `NodePorts` that can manually (or separate IAC) be mapped to a pre-existing loadbalancer.
### Elasticsearch
Elasticsearch deployed by BigBang uses memory mapping by default. In most cases, the default address space is too low and must be configured.
To ensure unnecessary privileged escalation containers are not used, these kernel settings should be done before BigBang is deployed:
```bash
sysctl -w vm.max_map_count=262144
```
More information can be found from elasticsearch's documentation [here](https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-virtual-memory.html#k8s-virtual-memory)
## OpenShift
1) When deploying BigBang, set the OpenShift flag to true.
```
# inside a values.yaml being passed to the command installing bigbang
openshift: true
......@@ -19,12 +82,16 @@ openshift: true
# OR inline with helm command
helm install bigbang chart --set openshift=true
```
2) Patch the istio-cni daemonset to allow containers to run privileged (AFTER istio-cni daemonset exists).
Note: it was unsuccessfully attempted to apply this setting via modifications to the helm chart. Online patching succeeded.
```
kubectl get daemonset istio-cni-node -n kube-system -o json | jq '.spec.template.spec.containers[] += {"securityContext":{"privileged":true}}' | kubectl replace -f -
```
3) Modify the OpenShift cluster(s) with the following scripts based on https://istio.io/v1.7/docs/setup/platform-setup/openshift/
```
# Istio Openshift configurations Post Install
oc -n istio-system expose svc/istio-ingressgateway --port=http2
......@@ -46,12 +113,34 @@ oc -n monitoring create -f NetworkAttachmentDefinition.yaml
## RKE2
Since BigBang makes several assumptions about volume and load balancing provisioning by default, it's vital that the rke2 cluster must be properly configured. The easiest way to do this is through the in tree cloud providers, which can be configured through the `rke2` configuration file such as:
```yaml
# aws, azure, gcp, etc...
cloud-provider-name: aws
# additionally, set below configuration for private AWS endpoints, or custom regions such as (T)C2S (us-iso-east-1, us-iso-b-east-1)
cloud-provider-config: ...
```
For example, if using the aws terraform modules provided [on repo1](https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform), setting the variable: `enable_ccm = true` will ensure all the necessary resources tags.
In the absence of an in-tree cloud provider (such as on-prem), the requirements can be met through the instructions outlined in the [storage](#storage) and [load balancing](#load-balancing) prerequisites section above.
### OPA Gatekeeper
Default PSP configurations for RKE2 prevent OPA Gatekeeper from coming up correctly. See [RKE2 Issue](https://repo1.dso.mil/platform-one/distros/rancher-federal/rke2/rke2-aws-terraform/-/issues/2) and [Big Bang Issue](https://repo1.dso.mil/platform-one/big-bang/bigbang/-/issues/10) Patching the PSPs in the cluster allow OPA Gatekeeper to start correctly:
A core component to Bigbang is OPA Gatekeeper, which operates as an elevated [validating admission controller](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) to audit and enforce various [constraints](https://github.com/open-policy-agent/frameworks/tree/master/constraint) on all requests sent to the kubernetes api server.
By default, `rke2` will deploy with [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) that disable these type of deployments. However, since we trust Bigbang (and OPA gatekeeper), we can patch the default `rke2` psp's to allow OPA.
Given a freshly installed `rke2` cluster, run the following commands _once_ before attempting to install BigBang.
```bash
kubectl patch psp system-unrestricted-psp -p '{"metadata": {"annotations":{"seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*"}}}'
kubectl patch psp global-unrestricted-psp -p '{"metadata": {"annotations":{"seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*"}}}'
kubectl patch psp global-restricted-psp -p '{"metadata": {"annotations":{"seccomp.security.alpha.kubernetes.io/allowedProfileNames": "*"}}}'
```
### Istio
By default, BigBang will use `istio-init`, and `rke2` clusters will come with `selinux` in `Enforcing` mode, please see the [`istio-init`](#istio-pre-requisites-on-selinux-enforcing-systems) above for pre-requisites and warnings.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment