User Tools

Site Tools


spark

Apache Spark on Mogon

Apache Spark is an open source framework for big data applications using Java Virtual Machines.

When using it on Mogon, one typically occupies a full node, as Spark requires a lot of resources.

After having occupied a node there are two possible use cases.

Job based usage

If you have an already packaged application that shall be submitted as a job, the the following script could be used to start a scala application (packed to myJar.jar) with an the entry point in the class Main in the package main.

#!/bin/bash
 
# load module
module load devel/Spark/2.2.0-Hadoop-2.6-Java-1.8.0_162
 
# start application
spark-submit --driver-memory 8G --master local[*] --class main.Main myJar.jar

The option –master local[*] allows Spark to adjust the number of workers on its own. The option –driver-memory is used to set the driver memory, the memory of the workers requires typically no changes.

Interactive usage

If you want to use it in a more explorative manner, then using the spark shell by spark-shell is also a possibility.

spark.txt · Last modified: 2018/11/28 14:20 by kbob01