Skip to main content

Deploy High Availability Manager with EKS

This guide walks through deploying Fluent Manager in High Availability mode on AWS EKS.

What gets created:

  • EKS cluster with managed node groups across 3 availability zones
  • RDS PostgreSQL with Multi-AZ failover (replaces the in-cluster DB)
  • EFS shared storage for WebDav (allows multiple backend replicas to share files)
  • All required IAM roles, security groups, and Kubernetes resources

Prerequisites

ToolVersion
Terraform>= 1.0
AWS CLI v2 configured with aws configurev2
kubectlany
Helm>= 3.0

Verify your AWS access before starting:

aws sts get-caller-identity

Accessing the Terraform Scripts

On our downloads page under the Download Manager Deployment Scripts section, you can download the Manager Terraform scripts needed to perform the High Availability deployment to EKS.

Before You Apply

The Terraform scripts require IAM permissions for EKS, EC2, EFS, RDS, IAM, CloudWatch Logs, and Auto Scaling.

info

If you are unsure whether your AWS user/role has the required permissions, ask your AWS administrator to attach AdministratorAccess before proceeding. For production accounts, a tighter policy can be scoped down after initial deployment.

Check permissions before applying

terraform plan only calls read/describe APIs and will succeed even if your user lacks the create permissions needed for terraform apply. Run this pre-flight check first — it asks IAM directly whether your identity is allowed to perform each required action:

Linux/macOS
aws iam simulate-principal-policy \
--policy-source-arn $(aws sts get-caller-identity --query Arn --output text) \
--action-names iam:CreatePolicy iam:CreateRole iam:AttachRolePolicy rds:CreateDBSubnetGroup rds:CreateDBInstance eks:CreateCluster ec2:CreateVpc elasticfilesystem:CreateFileSystem kms:CreateKey \
--query 'EvaluationResults[?EvalDecision!=`allowed`].[EvalActionName,EvalDecision]' \
--output table
Windows PowerShell
$arn = aws sts get-caller-identity --query Arn --output text
aws iam simulate-principal-policy --policy-source-arn $arn --action-names iam:CreatePolicy iam:CreateRole iam:AttachRolePolicy rds:CreateDBSubnetGroup rds:CreateDBInstance eks:CreateCluster ec2:CreateVpc elasticfilesystem:CreateFileSystem kms:CreateKey --query "EvaluationResults[?EvalDecision!='allowed'].[EvalActionName,EvalDecision]" --output table
Windows CMD
for /f "tokens=*" %i in ('aws sts get-caller-identity --query Arn --output text') do set ARN=%i
aws iam simulate-principal-policy --policy-source-arn %ARN% --action-names iam:CreatePolicy iam:CreateRole iam:AttachRolePolicy rds:CreateDBSubnetGroup rds:CreateDBInstance eks:CreateCluster ec2:CreateVpc elasticfilesystem:CreateFileSystem kms:CreateKey --query "EvaluationResults[?EvalDecision!='allowed'].[EvalActionName,EvalDecision]" --output table

If the output is empty, you have all required permissions. If any actions are listed as implicitDeny or explicitDeny, contact your AWS administrator to grant those permissions before proceeding — otherwise terraform apply will fail partway through.

caution

If the command itself returns AccessDenied, your user does not have permission to call iam:SimulatePrincipalPolicy. This is a strong indicator that IAM permissions are insufficient. Ask your AWS administrator to either run the simulation for you (passing your ARN as --policy-source-arn) or attach AdministratorAccess to your user.

Then verify the plan looks correct:

cd terraform/eks
terraform init
terraform plan

Deployment

Step 1 — Configure

Edit terraform/eks/terraform.tfvars with your values:

name       = "fluent-manager"   # Used as name prefix for all AWS resources
region = "us-east-1"
node_count = 2

Set your database password as an environment variable to keep it out of shell history:

Linux/macOS
export TF_VAR_db_password="YourSecurePassword123!"
Windows PowerShell
$env:TF_VAR_db_password="YourSecurePassword123!"
Windows CMD
set TF_VAR_db_password=YourSecurePassword123!

Step 2 — Deploy Infrastructure

cd terraform/eks
terraform init
terraform apply
info

This takes approximately 15–20 minutes and creates the full AWS infrastructure including the EFS StorageClass.

Step 3 — Configure kubectl

aws eks update-kubeconfig --name <cluster-name-from-terraform.tfvars> --region <region-from-terraform.tfvars>

# For example:
aws eks update-kubeconfig --name fluent-manager --region us-east-1

Step 4 — Install EFS CSI Driver

Linux/macOS
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
helm install aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
--namespace kube-system \
--set controller.serviceAccount.create=false \
--set controller.serviceAccount.name=efs-csi-controller-sa \
--set node.serviceAccount.create=false \
--set node.serviceAccount.name=efs-csi-node-sa
Windows CMD
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm repo update
helm install aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver --namespace kube-system --set controller.serviceAccount.create=false --set controller.serviceAccount.name=efs-csi-controller-sa --set node.serviceAccount.create=false --set node.serviceAccount.name=efs-csi-node-sa

Step 5 — Deploy Fluent Manager

The namespace and database secret are created automatically by terraform apply in step 2.

Linux/macOS — run from terraform/eks/
cd ../../helm-charts
helm install fluent-manager ./fluent \
-f fluent/values.yaml \
-f fluent/values-ha.yaml \
--set-string manager.envConfigMap.FLUENT_MANAGER_DATABASE_URL="$(terraform -chdir=../terraform/eks output -raw rds_endpoint)" \
--namespace fluent
Windows CMD — run from terraform/eks/
for /f "tokens=*" %i in ('terraform output -raw rds_endpoint') do set RDS_ENDPOINT=%i
cd ..\..\helm-charts
helm install fluent-manager ./fluent -f fluent/values.yaml -f fluent/values-ha.yaml --set-string manager.envConfigMap.FLUENT_MANAGER_DATABASE_URL=%RDS_ENDPOINT% --namespace fluent

Step 6 — Verify

kubectl get pods -n fluent       # All pods should be Running
kubectl get pvc -n fluent # backend-webdav-pvc should be Bound

Step 7 — Access Fluent Manager

Get the frontend and restful-engine LoadBalancer hostnames:

kubectl get svc frontend restful-engine -n fluent

Look for the EXTERNAL-IP column — these are the AWS ELB hostnames.

Open the frontend in your browser:

http://<frontend EXTERNAL-IP>

When logging in for the first time, Fluent Manager will prompt for the Restful Engine URL. Use:

http://<restful-engine EXTERNAL-IP>:8080
tip

The EXTERNAL-IP may show <pending> for 30–60 seconds while AWS provisions the load balancers. Re-run the command until hostnames appear.

Upgrading

Linux/macOS
cd helm-charts
helm upgrade fluent-manager ./fluent \
-f fluent/values.yaml \
-f fluent/values-ha.yaml \
--set-string manager.envConfigMap.FLUENT_MANAGER_DATABASE_URL="your-rds-endpoint:5432" \
--namespace fluent
Windows CMD
cd helm-charts
helm upgrade fluent-manager ./fluent -f fluent/values.yaml -f fluent/values-ha.yaml --set-string manager.envConfigMap.FLUENT_MANAGER_DATABASE_URL=your-rds-endpoint:5432 --namespace fluent

Teardown

danger

This permanently deletes all infrastructure including the database.

Linux/macOS
helm uninstall fluent-manager -n fluent
helm uninstall aws-efs-csi-driver -n kube-system

# RDS has deletion protection — disable it first
aws rds modify-db-instance \
--db-instance-identifier fluent-manager-fluent-manager-db \
--no-deletion-protection

cd terraform/eks
terraform destroy
Windows CMD
helm uninstall fluent-manager -n fluent
helm uninstall aws-efs-csi-driver -n kube-system
aws rds modify-db-instance --db-instance-identifier fluent-manager-fluent-manager-db --no-deletion-protection
cd ..\terraform\eks
terraform destroy

Troubleshooting

Terraform failed partway through (partial apply)

If terraform apply fails due to a permissions error midway, some resources will have been created and some won't. Do not manually delete anything in the AWS console. Fix the permissions issue first, then re-run apply — Terraform will skip what already exists and only create what failed:

terraform apply

To see exactly what was created before the failure:

terraform state list
Teardown fails due to permissions

terraform destroy requires the same permissions as apply. If destroy fails, check that you still have the required IAM permissions. RDS also has deletion protection enabled — you must disable it before destroy will succeed:

Linux/macOS
aws rds modify-db-instance \
--db-instance-identifier fluent-manager-fluent-manager-db \
--no-deletion-protection
terraform destroy
Windows CMD
aws rds modify-db-instance --db-instance-identifier fluent-manager-fluent-manager-db --no-deletion-protection
terraform destroy

If destroy fails because you have already manually removed resources in the AWS console, remove them from Terraform state to keep it in sync:

terraform state rm <resource_address>   # e.g. aws_efs_file_system.efs_object
`helm install` fails: ClusterRole/ClusterRoleBinding exists and cannot be imported

Terraform previously created the ClusterRole and ClusterRoleBinding for the EFS CSI driver. Those resources have been removed from Terraform and are now managed by Helm exclusively — but if you already ran terraform apply with the old configuration, the resources will exist in your cluster without Helm's ownership metadata.

Adopt each conflicting resource, then retry the install. Run both sets of commands up front as you may see the error for the ClusterRole first and then the ClusterRoleBinding:

Linux/macOS
kubectl label clusterrole efs-csi-external-provisioner-role app.kubernetes.io/managed-by=Helm
kubectl annotate clusterrole efs-csi-external-provisioner-role \
meta.helm.sh/release-name=aws-efs-csi-driver \
meta.helm.sh/release-namespace=kube-system

kubectl label clusterrolebinding efs-csi-provisioner-binding app.kubernetes.io/managed-by=Helm
kubectl annotate clusterrolebinding efs-csi-provisioner-binding \
meta.helm.sh/release-name=aws-efs-csi-driver \
meta.helm.sh/release-namespace=kube-system
Windows CMD
kubectl label clusterrole efs-csi-external-provisioner-role app.kubernetes.io/managed-by=Helm
kubectl annotate clusterrole efs-csi-external-provisioner-role meta.helm.sh/release-name=aws-efs-csi-driver meta.helm.sh/release-namespace=kube-system
kubectl label clusterrolebinding efs-csi-provisioner-binding app.kubernetes.io/managed-by=Helm
kubectl annotate clusterrolebinding efs-csi-provisioner-binding meta.helm.sh/release-name=aws-efs-csi-driver meta.helm.sh/release-namespace=kube-system

Also remove those resources from Terraform state so Terraform no longer tracks them (Helm now owns them):

terraform state rm kubernetes_cluster_role_v1.efs_csi_external_provisioner_role
terraform state rm kubernetes_cluster_role_binding_v1.example

Then retry helm install.

PVC stuck in Pending
kubectl describe pvc backend-webdav-pvc -n fluent

Common causes: EFS CSI driver not running, or StorageClass missing. Check the driver pods:

kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-efs-csi-driver
Backend pods crash with `password authentication failed for user "postgres"`

The Helm chart generates a random database password on first install, which won't match the password set in RDS by Terraform. Patch the secret with the correct password (the same value used for TF_VAR_db_password), then restart the backend:

Linux/macOS
kubectl patch secret fluent-manager-secret -n fluent --type='json' \
-p='[{"op":"replace","path":"/data/FLUENT_MANAGER_DATABASE_PASSWORD","value":"'"$(echo -n 'YourPassword' | base64)"'"},{"op":"replace","path":"/data/POSTGRES_PASSWORD","value":"'"$(echo -n 'YourPassword' | base64)"'"}]'
kubectl rollout restart deployment backend -n fluent
Windows CMD
:: Get the base64 value first using PowerShell, then patch
powershell -Command "[Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes('YourPassword'))"
:: Use the output of that command as BASE64_PASSWORD below
kubectl patch secret fluent-manager-secret -n fluent --type=json -p "[{\"op\":\"replace\",\"path\":\"/data/FLUENT_MANAGER_DATABASE_PASSWORD\",\"value\":\"BASE64_PASSWORD\"},{\"op\":\"replace\",\"path\":\"/data/POSTGRES_PASSWORD\",\"value\":\"BASE64_PASSWORD\"}]"
kubectl rollout restart deployment backend -n fluent
note

On fresh deploys this won't happen — step 5 pre-creates the secret with the correct password before helm install.

Restful-engine PVC stuck Pending (no default StorageClass provisioner)

EKS 1.23+ no longer ships an in-tree EBS provisioner. The default gp2 StorageClass uses kubernetes.io/aws-ebs (old in-tree driver) and will not provision volumes. The EBS CSI addon and a gp2-csi StorageClass using ebs.csi.aws.com are provisioned by Terraform (efs-iam.tf and efs-k8s.tf). The values-ha.yaml sets restfulEngine.storageClassName: gp2-csi to target this class explicitly.

For an already-running cluster (without re-deploying), apply just the new resources:

cd terraform/eks
terraform apply \
-target=aws_iam_role.ebs_csi_driver_role \
-target=aws_iam_role_policy_attachment.ebs_csi_driver_policy_attach \
-target=aws_eks_addon.ebs_csi_driver \
-target=kubernetes_storage_class_v1.gp2_csi

Then delete and re-create the stuck PVC so it picks up the new StorageClass:

kubectl delete pvc restful-engine-volume -n fluent
kubectl rollout restart deployment restful-engine -n fluent
Backend crash: `ERROR: schema "manager" does not exist`

On a fresh RDS instance, the manager PostgreSQL schema doesn't exist. Liquibase is configured to write its tracking tables to that schema, so it fails before any changelogs can run.

The backend deployment includes an init container (create-db-schema) that runs CREATE SCHEMA IF NOT EXISTS manager against RDS before the Spring Boot app starts. This is automatic on fresh deploys via the updated Helm chart.

For an already-running deployment, manually create the schema (replace <RDS_HOST> and <DB_PASSWORD> with your values):

kubectl run psql-init --image=postgres:15-alpine --restart=Never --rm -it -- \
sh -c 'PGPASSWORD=<DB_PASSWORD> psql -h <RDS_HOST> -U postgres -d fluent -c "CREATE SCHEMA IF NOT EXISTS manager;"'
kubectl rollout restart deployment backend -n fluent
Backend can't connect to database
kubectl logs -n fluent -l app=backend | grep -i "database\|connection"

# Check FLUENT_MANAGER_DATABASE_URL is set correctly
kubectl get configmap fluent-manager-env -n fluent -o yaml | grep DATABASE_URL

Check the RDS endpoint:

cd terraform/eks && terraform output rds_endpoint