UNCLASSIFIED - NO CUI

Apache Spark proposal

Project

We will grant permissions to submit the proposal

Name: Apache Spark

Desired Initial Maturity Level (Sandbox, Incubating, Graduated): Sandbox

Problem Statement (i.e. problem you want to solve): There are no packages available (that I'm aware of) in Big Bang that support large-scale, distributed data processing. These workloads could come in the form of data pipelines, machine-learning models, etc. Apache Spark is a great, open-source tool for these tasks.

Description: Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

Initial Members:

Edited by Lucas Rodriguez