spark

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
spark [2018/11/28 14:20]
kbob01 created
— (current)
Line 1: Line 1:
-====== Apache Spark on Mogon ====== 
  
-Apache Spark is an open source framework for big data applications using Java Virtual Machines. 
- 
-When using it on Mogon, one typically occupies a full node, as Spark requires a lot of resources. 
- 
-After having occupied a node there are two possible use cases. 
-=== Job based usage === 
-If you have an already packaged application that shall be submitted as a job, the the following script could be used to start a scala application (packed to myJar.jar) with an the entry point in the class Main in the package main. 
-  
-<code bash> 
-#!/bin/bash 
- 
-# load module 
-module load devel/Spark/2.2.0-Hadoop-2.6-Java-1.8.0_162 
- 
-# start application 
-spark-submit --driver-memory 8G --master local[*] --class main.Main myJar.jar 
- 
-</code> 
- 
- 
-The option ''--master local[*]''  allows Spark to adjust the number of workers on its own. 
-The option ''--driver-memory'' is used to set the driver memory, the memory of the workers requires typically no changes. 
- 
-=== Interactive usage === 
-If you want to use it in a more explorative manner, then using the spark shell by ''spark-shell'' is also a possibility.