Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

"Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters."


How to create an instance of Spark Cluster in the KASI Cloud

  1. Choose a spark cluster template 





  2. Use the default settings in most cases




  3. Choose a flavor and Set the number of slaves (minons), then Create 


Connect to the Master-Node and Run some basic scripts 






Introduction to Apache Spark

  • Using Apache Spark for Scientific Research: Basic Concepts and Scientific Examples
  • Jupyter Notebooks: csv-to-parquet, SDSS, HR4