start:working_on_mogon:workflow_organization:spark

Apache Spark on MOGON

Apache Spark is an open source framework for big data applications using Java Virtual Machines.

When using it on MOGON II, one typically occupies a full node (minimum), as Spark requires a lot of resources.

After having occupied a node there are two possible use cases.

Job based usage

If you have an already packaged application that shall be submitted as a job, the the following script could be used to start a scala application (packed to myJar.jar as an example) with an the entry point in the class Main in the package main.

#!/bin/bash
 
# further SLURM job settings go here
 
# load module
module load devel/Spark
 
# start application
spark-submit --driver-memory 8G --master local[*] --class main.Main myJar.jar

The option –master local[*] allows Spark to adjust the number of workers on its own. The option –driver-memory is used to set the driver memory, the memory of the workers requires typically no changes.

Interactive usage

If you want to use it in a more explorative manner, then using the spark shell by spark-shell is also a possibility.

  • start/working_on_mogon/workflow_organization/spark.txt
  • Last modified: 2020/10/02 15:10
  • by jrutte02