Logging package using PLG stack (Loki)

Sandbox was created already. Here is the link: https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/loki. I have given Riley maintainer access to that repo. The main branch is protected from pushes, but merge requests can be merged.

Initial requirements on making Loki part of Big Bang:

Need to have Iron Bank images for all containers, including init containers
Need to have paying customers desire to use the package
In this case since it would go into Big Bang core, justification for why all customers should have it (you've covered some in your description).

@gabe.scarberry may have some additional ones as time goes on, but this will get you started

Some initial thoughts on Loki integration:

Grafana and Fluent Bit are already part of Big Bang Core. Fluent Bit has its own Helm chart and Grafana is part of the monitoring stack with Prometheus. We would want to disable these items in Loki's Helm and have Loki connect to them as deployed in Big Bang. That allows us to switch from ELK to PLG seamlessly.
If a customer or BB has a Kibana dashboard, is there an easy transition to a Grafana dashboard or will this require a rebuild?
Will Loki work with Istio sidecar injection?

changed title from Logging package using PLG stack to Logging package using PLG stack (Loki)

added epiccandidate logging + 1 deleted label

changed the description

@jgoodson has started work on this as well:

Iron Bank Images:

Grafana Loki 2.2.3 Promtail

Initial Loki Work:

Jay has been working in this repo to get Loki working with bigbang.

https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/loki/-/tree/initial-work

Outstanding items:

Syntax for switching between stacks.

Proposed solution:



logging:
  enabled: true
  engine: plg # "elk" would be the other option.  What's default?

which would configure Promtail instead of fluentbit, deploy loki, and adjust how Grafana is deployed so the datasource for Loki is available.

Cluster Auditor

Cluster Auditor currently depends on Elasticsearch and Kibana. This epic needs to be completed before we can run CA AND Loki.

&110 (closed)

Data Migration

Is there a way to take existing Elastic logs and ingest them into Loki as a team switches from ES to Loki?

created branch 647-logging-loki to address this issue

mentioned in issue #373 (closed)

Current:


graph TD

    subgraph current
        istio ==>|traces | jaeger ==>| storage | elasticsearch ==> jaegerUI
        fluentbit ==>|logs | elasticsearch ==> kibana
        exp[Exporters] ==>|metrics | prometheus[current] ==> grafana
        CA[Cluster Auditor] ==>|violations | elasticsearch
    end

When switching out the logging stack to remove the need for Elasticsearch, we also need to consider the Jaeger backend of Elasticsearch. https://www.jaegertracing.io/docs/1.25/deployment/#storage-backends identifies Elastic, Cassandra, kafka as acceptable backends. The in-memory backend wouldn't work for long term storage of traces, and badger only works with all-in-one, which probably shouldn't be used for production workloads that need traces.

The alternative would be to use tempo, a tracing engine that works with the grafana stack and is queriable by Grafana. This would result in the following:

graph TD

    subgraph grafana
        Istio[istio] ==>|traces | Tempo(Tempo) ==> Grafana
        promtail ==>|logs | loki  ==> Grafana
        ClusterAuditor==>|violations | Prometheus
        Exporters ==> |metrics | Prometheus ==> Grafana
    end

So if Jaeger is deployed, we need elasticsearchkibana and the eck operator, so the proposed solution would be to remove jaeger all to gether as a core BB product and create the two scenarios:

graph TD

    subgraph all-grafana
        Istio[istio] ==>|traces | Tempo(Tempo) ==> Grafana
        promtail ==>|logs | loki  ==> Grafana
        ClusterAuditor==>|violations | Prometheus
        Exporters ==> |metrics | Prometheus ==> Grafana
    end

    subgraph efk-grafana
        fluentbit ==>|logs | elasticsearch ==> Kibana
        Istio[istio] ==>|traces | Tempo(Tempo) ==> Grafana
        ClusterAuditor==>|violations | Prometheus
        Exporters ==> |metrics | Prometheus ==> Grafana
    end

Which would just pull off the logging stack from Grafana, and leave tracing only using Tempo.

mentioned in epic &123 (closed)

assigned to @riley.odonnell

added priority7 + 1 deleted label

added priority6 + 1 deleted label and removed priority7 + 1 deleted label

changed the description

@runyontr appreciate the write up, going to add a few follow-up questions:

Grafana is currently bundled in with the monitoring package, will it always be safe to assume in any given cluster that it will be deployed as part of that, or would it make sense to split Grafana out into it's own separate package?
Should Loki and Promtail (and maybe even Tempo) be bundled together or separately? It is possible to use fluentbit as a log shipper, should we support that scenario or just prescribe Promtail when using Loki?

added teamcore/security + 1 deleted label

added teamXForce + 1 deleted label and removed teamcore/security + 1 deleted label

promoted to epic &143 (closed)

Initial MVP

Use "All-in-One" loki chart as initial capability for dev/air gap. Options for S3 (see if partybus would want this as.
integrate to use only a PVC as a single node deployment.

Ongoing work:

Add "All-in-one" capability to distributed chart upstream in Grafana
Update BB Loki chart to use distributed with All-in-one
switch Distributed + All-in-one with the single node chart in BB.
Update BBUmbrella and ensure that deployments deployed with single node still work, and provide options to use MinIO/S3 and distrbuted.

removed epiccandidate label

Logging package using PLG stack (Loki)

Feature Request

Why

Proposed Solution

Links and Other References

Designs

Child items ...

Activity

Iron Bank Images:

Outstanding items:

Syntax for switching between stacks.

Cluster Auditor

Data Migration

Admin message

Admin message

Logging package using PLG stack (Loki)

Feature Request

Why

Proposed Solution

Links and Other References

Activity

Iron Bank Images:

Outstanding items:

Syntax for switching between stacks.

Cluster Auditor

Data Migration