Apache Spark on MOGON
Apache Spark is an open source framework for big data applications using Java Virtual Machines.
When using it on MOGON II, one typically occupies a full node (minimum), as Spark requires a lot of resources.
After having occupied a node there are two possible use cases.
Job based usage
If you have an already packaged application that shall be submitted as a job, the the following script could be used to start a scala application (packed to myJar.jar
as an example) with an the entry point in the class Main in the package main.
#!/bin/bash # further SLURM job settings go here # load module module load devel/Spark # start application spark-submit --driver-memory 8G --master local[*] --class main.Main myJar.jar
The option –master local[*]
allows Spark to adjust the number of workers on its own.
The option –driver-memory
is used to set the driver memory, the memory of the workers requires typically no changes.
Interactive usage
If you want to use it in a more explorative manner, then using the spark shell by spark-shell
is also a possibility.