Apache Spark


Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.

It is a unified analytics engine for large-scale data processing. Spark provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing.


  • Scala
  • Java
  • Python
  • HiveQL
  • R