UNCLASSIFIED - NO CUI

Skip to content
Snippets Groups Projects
Commit d7f5503b authored by Brett Charrier's avatar Brett Charrier Committed by Ryan Garcia
Browse files

Update Troubleshooting doc with common problems and fixes

parent 9a4fe9ec
No related branches found
No related tags found
1 merge request!110Update Troubleshooting doc with common problems and fixes
......@@ -7,7 +7,7 @@
- Login to Kibana
- username: elastic
- Password : can be obtained by querying kubectl get secret elasticsearch-es-elastic-user -n elastic -o yaml
- Password : can be obtained by querying kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'
- Create Index by selecting Management icon from the left menu and clicking Index patterns under Kibana. In the Create Index patterns enter <logstash-*> and click create index pattern. In the the next step Click on the dropdown and select "@timestamp"
- For Search click on Discovery from the side menu
......@@ -28,4 +28,4 @@ Further filters that can be used are:
#### Elasticsearch Pods
- `kubernetes.pod_name` = `elastic-es-default-#` to get logs from a specific # pod
- `kubernetes.container_name` = `elasticsearch` or `elastic-internal-init-filesystem` to get logs from a specific container within the pod
\ No newline at end of file
- `kubernetes.container_name` = `elasticsearch` or `elastic-internal-init-filesystem` to get logs from a specific container within the pod
......@@ -42,3 +42,114 @@ kubectl get elasticsearches -A
```
#### Error Failed to Flush Chunk
The Fluentbit pods on the Release Cluster may have ocasional issues with reliably sending their 2000+ logs per minute to ElasticSearch because ES is not tuned properly
Warnings/Errors should look like:
```
[ warn] [engine] failed to flush chunk '1-1625056025.433132869.flb', retry in 257 seconds: task_id=788, input=storage_backlog.2 > output=es.0 (out_id=0)
[error] [output:es:es.0] HTTP status=429 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=105667381, replica_bytes=0, all_bytes=105667381, coordinating_operation_bytes=2480713, max_coordinating_and_primary_bytes=107374182]"}]
```
Fix involves increasing `resource.requests`, `resource.limits`, and `heap` for ElasticSearch data pods in `chart/values.yaml`
```yaml
logging:
values:
elasticsearch:
data:
resources:
requests:
cpu: 2
memory: 10Gi
limits:
cpu: 3
memory: 14Gi
heap:
min: 4g
max: 4g
```
#### Error Cannot Increase Buffer
In a heavily utilized production cluser, an intermitent warning that the buffer could not be increased may appear
Warning:
```
[ warn] [http_client] cannot increase buffer: current=32000 requested=64768 max=32000
```
Fix involves increasing the `Buffer_Size` within the Kubernetes Filter in fluentbit/chart/values.yaml
```yaml
fluentbit:
values:
config:
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
Buffer_Size 1M
```
#### Yellow ES Health Status and Unassigned Shards
After a BigBang `autoRollingUpgrade` job, cluster shard allocation may not have been properly re-enabled resulting in a yellow health status for the ElasticSearch cluster and Unassigned Shards
To check Cluster Health run:
```
kubectl get elasticsearch -A
```
To view the sttus of shards run:
```
curl -XGET -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cat/shards?h=index,shard,prirep,state,un
assigned.reason"
```
To fix, run the following commands:
```
kubectl port-forward svc/logging-ek-es-http -n logging 9200:9200
curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "index.routing.allocation.disable_allocation": false }'
curl -XPUT -H 'Content-Type: application/json' -ku "elastic:$(kubectl get secrets -n logging logging-ek-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')" "https://localhost:9200/_cluster/settings" -d '{ "transient" : { "cluster.routing.allocation.enable" : "all" } }'
```
#### CPU/Memory Limits and Heap
CPU/Memory limits and Heap must be configured to match and have sufficient resources with the heap min and max equal in `chart/values.yaml`
```yaml
master:
resources:
limits:
cpu: 1
memory: 4Gi
requests:
cpu: 1
memory: 4Gi
heap:
# -- Elasticsearch master Java heap Xms setting
min: 2g
# -- Elasticsearch master Java heap Xmx setting
max: 2g
```
#### Crash due to too low map count
VM Max Map Count must be set or it will result in a crash due to the default OS limits on nmap being too low
Must be set as root in `/etc/sysctl.conf` and verified by running `sysctl vm.max_map_count`
Automatically set in k3d-dev.sh
```
sysctl -w vm.max_map_count=262144
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment