UNCLASSIFIED - NO CUI

Skip to content

Fleet-Server Unable to start

Summary

When this image is deployed as a fleet-server in a sample k8s deployment like this:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
spec:
  version: 9.0.1
  image: registry1.dso.mil/ironbank/elastic/beats/elastic-agent:9.0.1
  mode: fleet
  fleetServerEnabled: true
  ...

The fleet-server fails to start due to permission issues, seen in both 9.0.1 and 8.17.4 versions of the image.

Steps to reproduce

To reproduce without using a kubernetes cluster:

Instructions prefer an x86 processor. To replicate on an ARM64 machine, enable Rosetta emulation in Docker.

  1. docker pull registry1.dso.mil/ironbank/elastic/beats/elastic-agent:9.0.1
  2. docker run --env FLEET_SERVER_ENABLE=true --env FLEET_SERVER_ELASTICSEARCH_HOST=https://localhost:9200 --env FLEET_SERVER_SERVICE_TOKEN=abc --rm registry1.dso.mil/ironbank/elastic/beats/elastic-agent:9.0.1
  3. See "message":"Fleet Server - Failed: execution of component prevented: cannot be writeable by group or other". After hanging for 2 minutes, the container will exit.

What is the current bug behavior?

The moment a new fleet-server is spawned, it fails with the above error message. The supplied flags (ES_HOST=https://localhost:9200) and (TOKEN=abc) do not resolve correctly, but because the resource fails so immediately, it's not relevant for demonstrating this particular issue.

What is the expected correct behavior?

You can run the same process but with the opensource version of the container to see the fleet-server progress past spawning. Obviously, it will fail to connect to anything since we didn't take the time to start an es-kibana container(s) and setup tokens, but it will progress past startup.

  1. docker pull elastic/elastic-agent:9.0.1

  2. docker run --env FLEET_SERVER_ENABLE=true --env FLEET_SERVER_ELASTICSEARCH_HOST=https://localhost:9200 --env FLEET_SERVER_SERVICE_TOKEN=abc --rm elastic/elastic-agent:9.0.1

  3. See

    "message":"Spawned new component fleet-server-default: Starting: spawned pid '39'"

    "fleet-server-default","type":"output","state":"STARTING"

    "message":"starting communication connection back to Elastic Agent"

    "message":"Component state changed fleet-server-default (STARTING->HEALTHY)

Relevant logs and/or screenshots

"message":"Fleet Server - Failed: execution of component prevented: cannot be writeable by group or other

agent Error: preparing STATE_PATH(/usr/share/elastic-agent/state) failed: mkdir /usr/share/elastic-agent/state/data: permission denied Error: cannot create folders for config path '/usr/share/elastic-agent/state': mkdir /usr/share/elastic-agent/state: read-only file system

Possible fixes

Instructions exist to compensate for running an agent on a read-only filesystem/

https://www.elastic.co/docs/reference/fleet/elastic-agent-container#_step_4_run_the_elastic_agent_image for Docker, and https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/configuration-fleet#k8s-elastic-agent-running-as-a-non-root-user for ECK.

docker run --mount type=bind,source="$(pwd)/state",destination=/state -e {STATE_PATH}=/state --read-only --env FLEET_SERVER_ENABLE=true --env FLEET_SERVER_ELASTICSEARCH_HOST=https://localhost:9200 --env FLEET_SERVER_SERVICE_TOKEN=abc --rm registry1.dso.mil/ironbank/elastic/beats/elastic-agent:9.0.1

However, this exhibits a different issue, pasted above in the Relevant logs section. Error: ... read-only file system The same is true for a k8s deployment.

One potential solution outlined here was attempted but seemed to not positively impact the deployment: https://github.com/elastic/cloud-on-k8s/issues/6193 (Adding an emptyDir volume to not only the elastic-agents, but the fleet-server resource as well)

Similar issues reported here may offer a potential solution: https://discuss.elastic.co/t/ironbank-elastic-agent-8-9-0-issues-tinit-group-writeable-components/343274 https://github.com/elastic/elastic-agent/issues/4539 (chmod on some combination of elastic-agent/component directories)

Tasks

  • Bug has been identified and corrected within the container or the project README is updated with deployment configuration that is necessary in order for the fleet-server to start.

Please read the Iron Bank Documentation for more info

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information