Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
start:development:scripting_languages:julia [2021/04/15 17:30] jrutte02 [Submitting a Parallel Julia Job] |
start:development:scripting_languages:julia [2021/12/08 19:58] jrutte02 [Julia Modules] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Julia Programming Language | ====== Julia Programming Language | ||
+ | |||
+ | <callout type=" | ||
+ | < | ||
+ | <bar value=" | ||
+ | <bar value=" | ||
+ | <bar value=" | ||
+ | </ | ||
+ | </ | ||
+ | |||
[[https:// | [[https:// | ||
Line 25: | Line 34: | ||
JGU HPC Modules | JGU HPC Modules | ||
------------------- / | ------------------- / | ||
- | | + | |
+ | lang/ | ||
</ | </ | ||
- | One can use a certain | + | One can use a specific |
<code bash> | <code bash> | ||
Line 34: | Line 44: | ||
</ | </ | ||
- | ==== Submitting a Serial Julia Job ==== | + | or the current default module with: |
- | <file bash hello_mogon.jl> | + | <code bash> |
+ | module load lang/ | ||
+ | </ | ||
+ | |||
+ | ====== Pkg - Julia' | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | ==== Pkg - Usage ==== | ||
+ | |||
+ | Initially, you need to load a Julia Module on a MOGON service-node, | ||
+ | |||
+ | <code bash> | ||
+ | module load lang/ | ||
+ | </ | ||
+ | |||
+ | For package installation, | ||
+ | |||
+ | <code bash> | ||
+ | julia | ||
+ | </ | ||
+ | |||
+ | from the command line. Now start the Pkg REPL (Pkg also comes with a REPL) by pressing '' | ||
+ | |||
+ | <code bash> | ||
+ | (1.6) pkg> | ||
+ | </ | ||
+ | |||
+ | We will use the package **Dates** ([[https:// | ||
+ | |||
+ | <code bash> | ||
+ | (1.6) pkg> add Dates | ||
+ | </ | ||
+ | |||
+ | you should get an output similar to: | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> add Dates | ||
+ | | ||
+ | Updating `/ | ||
+ | [ade2ca70] + Dates | ||
+ | No Changes to `/ | ||
+ | </ | ||
+ | |||
+ | Let's check the successful installation: | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> status | ||
+ | </ | ||
+ | |||
+ | Depending on which packages you have already installed, you should get an output similar to the following: | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> status | ||
+ | Status `/ | ||
+ | [6e4b80f9] BenchmarkTools v0.7.0 | ||
+ | [052768ef] CUDA v3.0.3 | ||
+ | [7a1cc6ca] FFTW v1.3.2 | ||
+ | [da04e1cc] MPI v0.17.2 | ||
+ | [91a5bcdd] Plots v1.11.2 | ||
+ | [d330b81b] PyPlot v2.9.0 | ||
+ | [ade2ca70] Dates | ||
+ | [de0858da] Printf | ||
+ | [9a3f8284] Random | ||
+ | </ | ||
+ | |||
+ | This indicates that '' | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> test Dates | ||
+ | | ||
+ | Status `/ | ||
+ | [ade2ca70] Dates `@stdlib/ | ||
+ | [b77e0a4c] InteractiveUtils `@stdlib/ | ||
+ | [de0858da] Printf `@stdlib/ | ||
+ | [8dfed614] Test `@stdlib/ | ||
+ | Status `/ | ||
+ | [2a0f44e3] Base64 `@stdlib/ | ||
+ | [ade2ca70] Dates `@stdlib/ | ||
+ | [b77e0a4c] InteractiveUtils `@stdlib/ | ||
+ | [56ddb016] Logging `@stdlib/ | ||
+ | [d6f4376e] Markdown `@stdlib/ | ||
+ | [de0858da] Printf `@stdlib/ | ||
+ | [9a3f8284] Random `@stdlib/ | ||
+ | [9e88b42a] Serialization `@stdlib/ | ||
+ | [8dfed614] Test `@stdlib/ | ||
+ | [4ec0a83e] Unicode `@stdlib/ | ||
+ | | ||
+ | [ ... ] | ||
+ | Test Summary: | ||
+ | Conversions to/from numbers | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | The installation was sucessful and '' | ||
+ | |||
+ | <code julia> | ||
+ | using Dates | ||
+ | </ | ||
+ | |||
+ | Yoa can now add any packages from the Standard Library to Julia, but please note the following: | ||
+ | |||
+ | <callout type=" | ||
+ | Some Julia packages require you to load pre-requisite dependencies as modules before you can add the via '' | ||
+ | </ | ||
+ | |||
+ | This is also illustrated again by the following examples [[start: | ||
+ | |||
+ | === Pkg - Commands === | ||
+ | In the Pkg REPL you have the following commands available to manage packages: | ||
+ | |||
+ | ^ Command ^ Result ^ Comment^ | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |'' | ||
+ | |||
+ | |||
+ | Installing packages using '' | ||
+ | |||
+ | ==== Pkg - CUDA ==== | ||
+ | |||
+ | Julia' | ||
+ | |||
+ | Log in to MOGON and load the following modules on a service-node first: | ||
+ | |||
+ | < | ||
+ | module load system/ | ||
+ | module load lang/ | ||
+ | </ | ||
+ | |||
+ | Now start Julia with the following command | ||
+ | |||
+ | <code bash> | ||
+ | julia | ||
+ | </ | ||
+ | |||
+ | and then change to the Pkg REPL with '' | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> | ||
+ | </ | ||
+ | |||
+ | We are now ready to add CUDA via | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> add CUDA | ||
+ | </ | ||
+ | |||
+ | You are now ready to use '' | ||
+ | |||
+ | ==== Pkg - Plots ==== | ||
+ | |||
+ | [[https:// | ||
+ | Log in to MOGON and load the following modules on a service-node first: | ||
+ | |||
+ | <code bash> | ||
+ | module load lang/ | ||
+ | module load vis/ | ||
+ | module load lang/ | ||
+ | </ | ||
+ | |||
+ | Open Julia by executing the following command after the modules have been successfully loaded | ||
+ | |||
+ | <code bash> | ||
+ | julia | ||
+ | </ | ||
+ | |||
+ | now enter the Pkg REPL by pressing '' | ||
+ | <code julia> | ||
+ | (v1.6) pkg> | ||
+ | </ | ||
+ | First, the actual packages are added and then the backend is configured. Install '' | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> add Plots | ||
+ | </ | ||
+ | |||
+ | now set the backend to '' | ||
+ | |||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> add PyPlot | ||
+ | </ | ||
+ | Afterwards, test the successful installation with: | ||
+ | <code julia> | ||
+ | (v1.6) pkg> test Plots | ||
+ | </ | ||
+ | |||
+ | and | ||
+ | |||
+ | <code julia> | ||
+ | (v1.6) pkg> test PyPlot | ||
+ | </ | ||
+ | |||
+ | ==== The Julia Standard Library ==== | ||
+ | |||
+ | An overview of available packages for Julia can be found in the [[https:// | ||
+ | |||
+ | ^ Package ^ Comment ^ | ||
+ | | Random | | | ||
+ | | CUDA | | | ||
+ | | LinearAlgebra | | | ||
+ | | Printf| | | ||
+ | | MPI | | | ||
+ | | BenchmarkTools|| | ||
+ | | FFTW | | | ||
+ | |||
+ | |||
+ | ====== Submitting a Serial Julia Job ====== | ||
+ | |||
+ | <file julia hello_mogon.jl> | ||
println(" | println(" | ||
</ | </ | ||
Line 64: | Line 287: | ||
Hello MOGON! | Hello MOGON! | ||
</ | </ | ||
- | ==== Submitting a Parallel Julia Job ==== | + | ====== Submitting a Parallel Julia Job ====== |
Julia offers [[https:// | Julia offers [[https:// | ||
- | === Multi-Threading === | + | ===== Multi-Threading |
< | < | ||
Line 84: | Line 307: | ||
But let's explore the basics of Julia' | But let's explore the basics of Julia' | ||
- | < | + | < |
Threads.@threads for i=1:20 | Threads.@threads for i=1:20 | ||
Line 140: | Line 363: | ||
</ | </ | ||
- | === Distributed Processing === | + | ===== Distributed Processing ===== |
+ | |||
+ | < | ||
+ | Starting with '' | ||
+ | |||
+ | < | ||
+ | </ | ||
+ | |||
+ | |||
+ | <file julia distributed_julia_example.jl> | ||
+ | @everywhere begin | ||
+ | using LinearAlgebra | ||
+ | a = zeros(200, | ||
+ | end | ||
+ | |||
+ | slurm_cores = parse(Int, ENV[" | ||
+ | slurm_tasks = parse(Int, ENV[" | ||
+ | |||
+ | println(" | ||
+ | println(" | ||
+ | |||
+ | println(" | ||
+ | println(" | ||
+ | |||
+ | calctime = @elapsed @sync @distributed for i=1:200 | ||
+ | a[i] = maximum(abs.(eigvals(rand(500, | ||
+ | end | ||
+ | |||
+ | println(" | ||
+ | |||
+ | </ | ||
+ | |||
+ | <file bash julia_distributed_job.slurm> | ||
+ | #SBATCH --partition=smp | ||
+ | #SBATCH --account=< | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mem-per-cpu=4096 #4GB | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --ntasks-per-node=1 | ||
+ | #SBATCH --cpus-per-task=2 | ||
+ | #SBATCH --job-name=dist_julia | ||
+ | #SBATCH --output=%x_%j.out | ||
+ | #SBATCH --error=%x_%j.err | ||
+ | |||
+ | module purge | ||
+ | module load lang/ | ||
+ | |||
+ | |||
+ | julia --procs $SLURM_CPUS_PER_TASK parallel_julia_example.jl | ||
+ | </ | ||
+ | |||
+ | <code bash> | ||
+ | cat dist_julia*.out | ||
+ | </ | ||
+ | |||
+ | <code bash> | ||
+ | [ ... ] | ||
+ | With 2 CPUs per Task the calculation took 34.029345377 seconds. | ||
+ | </ | ||
+ | |||
+ | ^ CPUs per Task ^ Runtime (s) ^ | ||
+ | | 2 | 34.03 | | ||
+ | | 4 | 19.27 | | ||
+ | | 6 | 14.58 | | ||
+ | | 8 | 12.02 | | ||
+ | |||
+ | ===== Using MPI with Julia ===== | ||
+ | Of course, you can also use MPI with Julia on MOGON. This requires you to first carry out the following setup for Julia and the MPI interface '' | ||
- | ==== Julia MPI Job ==== | ||
=== Julia MPI Setup === | === Julia MPI Setup === | ||
Line 149: | Line 439: | ||
<code bash> | <code bash> | ||
- | module load mpi/ | + | module load mpi/ |
module load lang/Julia | module load lang/Julia | ||
</ | </ | ||
Line 160: | Line 450: | ||
The output should should be similar to the following if installation and build was successful: | The output should should be similar to the following if installation and build was successful: | ||
- | < | + | < |
[ ... ] | [ ... ] | ||
[ Info: using system MPI | [ Info: using system MPI | ||
Line 166: | Line 456: | ||
│ | │ | ||
│ | │ | ||
- | └ | + | └ |
┌ Info: MPI implementation detected | ┌ Info: MPI implementation detected | ||
│ impl = OpenMPI:: | │ impl = OpenMPI:: | ||
- | │ | + | │ |
└ abi = " | └ abi = " | ||
+ | |||
+ | 1 dependency successfully precompiled in 3 seconds (140 already precompiled, | ||
</ | </ | ||
Line 176: | Line 468: | ||
Now that MPI and Julia have been set up correctly, we can proceed to the example. | Now that MPI and Julia have been set up correctly, we can proceed to the example. | ||
- | < | + | < |
using MPI | using MPI | ||
Line 233: | Line 525: | ||
</ | </ | ||
+ | |||
+ | ====== Submitting a Julia GPU Job ====== | ||
+ | |||
+ | Before you start parallelising with Julia on MOGON GPUs, you need to prepare your Julia environemnt for the usage pf GPUs, as we explaind earlier in the Article about [[https:// | ||
+ | |||
+ | |||
+ | === CPU/GPU Bandwidth and Read+Write Speed === | ||
+ | |||
+ | The test estimates how fast data can be sent to and read from the GPU. Since the GPU is plugged into a PCI bus, this largely depends on the speed of the PCI bus as well as many other factors. However, there is also some overhead included in the measurements, | ||
+ | |||
+ | The theoretical bandwidth per lane for **PCIe 3.0** is $0.985 GB/s$. For the GTX 1080Ti (**PCIe3 x16**) used in our MOGON GPU nodes the 16-lane slot could theoretical give $15.754 GB/s$.(( This example was taken from the [[https:// | ||
+ | |||
+ | <file julia gpu_rw_perf.jl> | ||
+ | using LinearAlgebra | ||
+ | using Plots | ||
+ | pyplot() | ||
+ | using BenchmarkTools | ||
+ | using Printf | ||
+ | using CUDA | ||
+ | using Random | ||
+ | |||
+ | sizes = 2 .^ (14:30); | ||
+ | timeSend = Array{Float64}(undef, | ||
+ | timeGather = Array{Float64}(undef, | ||
+ | sendBandwidth = Array{Float64}(undef, | ||
+ | gatherBandwidth = Array{Float64}(undef, | ||
+ | memoryTimesCPU = Array{Float64}(undef, | ||
+ | memoryTimesGPU = Array{Float64}(undef, | ||
+ | memoryBandwidthGPU = Array{Float64}(undef, | ||
+ | memoryBandwidthCPU = Array{Float64}(undef, | ||
+ | |||
+ | for i = 1: | ||
+ | GC.gc(true) | ||
+ | numElements = convert(Int64, | ||
+ | cpuData = rand(0:9, (numElements, | ||
+ | gpuData = CuArray{Float64}(rand(0: | ||
+ | # Time to GPU | ||
+ | timeSend[i] = CUDA.@elapsed CuArray(cpuData); | ||
+ | # Time from GPU | ||
+ | timeGather[i] = CUDA.@elapsed Array(gpuData); | ||
+ | sendBandwidth[i] = (sizes[i] / timeSend[i] / 1e9); | ||
+ | gatherBandwidth[i] = (sizes[i] / timeGather[i] / 1e9); | ||
+ | memoryTimesGPU[i] = CUDA.@elapsed CUDA.@sync gpuData .+ 1; | ||
+ | memoryBandwidthGPU[i] = 2*(sizes[i] / memoryTimesGPU[i] / 1e9); | ||
+ | memoryTimesCPU[i] = @elapsed cpuData .+ 1; | ||
+ | memoryBandwidthCPU[i] = 2*(sizes[i] / memoryTimesCPU[i] / 1e9); | ||
+ | end | ||
+ | |||
+ | @printf(" | ||
+ | @printf(" | ||
+ | @printf(" | ||
+ | @printf(" | ||
+ | |||
+ | p1 = plot( | ||
+ | sizes, | ||
+ | sendBandwidth, | ||
+ | lw = 2, | ||
+ | legend = :topleft, | ||
+ | xaxis = (" | ||
+ | xlims = (10^4, 10^9), | ||
+ | frame = true, | ||
+ | label = string(" | ||
+ | ); | ||
+ | plot!(p1, sizes, gatherBandwidth, | ||
+ | plot!(p1, | ||
+ | plot!(p1, | ||
+ | plot!(p1, | ||
+ | scatter!( | ||
+ | [sizes[argmax(sendBandwidth)], | ||
+ | [maximum(sendBandwidth), | ||
+ | label = "", | ||
+ | marker = (10, 0.3, [:blue, :red]), | ||
+ | ); | ||
+ | |||
+ | p2 = plot( | ||
+ | sizes, | ||
+ | memoryBandwidthGPU, | ||
+ | lw = 2, | ||
+ | legend = :topleft, | ||
+ | xaxis = (" | ||
+ | xlims = (10^4, 10^9), | ||
+ | frame = true, | ||
+ | label = string(" | ||
+ | ); | ||
+ | plot!(p2, | ||
+ | plot!(p2, | ||
+ | plot!(p2, | ||
+ | plot!(p2, | ||
+ | scatter!( | ||
+ | [sizes[argmax(memoryBandwidthGPU)], | ||
+ | [maximum(memoryBandwidthGPU), | ||
+ | label = "", | ||
+ | marker = (10, 0.3, [:blue, :red]), | ||
+ | ); | ||
+ | |||
+ | p3 = plot(p1,p2, layout = grid(2, 1, widths=[1]), | ||
+ | |||
+ | savefig(p3, " | ||
+ | </ | ||
+ | |||
+ | The job script is pretty ordinary. In this example, we use only one GPU and start Julia with four threads. To do this, we request one process with four cpus for multithreading: | ||
+ | |||
+ | <file bash julia_gpu_rw_job.slurm> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --account=< | ||
+ | #SBATCH --job-name=gpu_rw | ||
+ | #SBATCH --output=%x_%j.out | ||
+ | #SBATCH --error=%x_%j.err | ||
+ | #SBATCH --partition=m2_gpu | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mem=11550 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=4 | ||
+ | |||
+ | module purge | ||
+ | module load lang/ | ||
+ | module load vis/ | ||
+ | module load system/ | ||
+ | module load lang/Julia | ||
+ | |||
+ | julia --threads 4 gpu_rw.jl | ||
+ | </ | ||
+ | |||
+ | The job is submitted with the following command | ||
+ | |||
+ | <code bash> | ||
+ | sbatch julia_gpu_rw_job.slurm | ||
+ | </ | ||
+ | The job will be finished after a few minutes, you can view the output as follows: | ||
+ | <code bash> | ||
+ | cat gpu_rw*.out | ||
+ | </ | ||
+ | The output should be similar to the following lines: | ||
+ | <code bash> | ||
+ | Achieved peak send speed of 8.9 GB/s | ||
+ | Achieved peak gather speed of 4.4 GB/s | ||
+ | Achieved peak read+write speed on the GPU: 334.0 GB/s | ||
+ | Achieved peak read+write speed on the CPU: 5.7 GB/s | ||
+ | </ | ||
+ | The Julia script also generates a plot, which we would like to show here: | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | === Memory Intensive Operations === | ||
+ | |||
+ | You might be familiar with this example if you sumbled upon our [[https:// | ||
+ | |||
+ | < | ||
+ | For operations where the number of floating-point computations performed per element read from or written to memory is high, the memory speed is much less important. In this case the number and speed of the floating-point units is the limiting factor. These operations are said to have high " | ||
+ | |||
+ | A good test of computational performance is a matrix-matrix multiply. For multiplying two $N \times N$ matrices, the total number of floating-point calculations is | ||
+ | $$ FLOPS(N) = 2N^3 - N^2 $$ | ||
+ | |||
+ | Two input matrices are read and one resulting matrix is written, for a total of $3N^2$ elements read or written. This gives a computational density of $(2N - 1)/3$ FLOP/ | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | The difference to our MATLAB article is of course the adaptation to native Julia code. But even so, we have made a few alterations due to the use of the Julia language. When defining vectors or arrays, we have purposely chosen '' | ||
+ | |||
+ | <file julia gpu_perf.jl> | ||
+ | using LinearAlgebra | ||
+ | using Plots | ||
+ | pyplot() | ||
+ | using BenchmarkTools | ||
+ | using Printf | ||
+ | using CUDA | ||
+ | |||
+ | sizes = 2 .^ (12:2:28); | ||
+ | N = convert(Array{Int128, | ||
+ | |||
+ | timeCPU = Vector{Float32}(undef, | ||
+ | timeGPU = Vector{Float32}(undef, | ||
+ | for i = 1: | ||
+ | # First on the CPU | ||
+ | An = rand(Float32, | ||
+ | Bn = rand(Float32, | ||
+ | timeCPU[i] = @elapsed An * Bn | ||
+ | GC.gc(true) | ||
+ | # Now on the GPU | ||
+ | Ac = CUDA.rand(N[i], | ||
+ | Bc = CUDA.rand(N[i], | ||
+ | timeGPU[i] = CUDA.@elapsed Ac * Bc | ||
+ | GC.gc(true) | ||
+ | CUDA.reclaim() | ||
+ | end | ||
+ | |||
+ | |||
+ | gflopsCPU = (2 * N .^ 3 - N .^ 2) ./ timeCPU / 1e9; | ||
+ | gflopsGPU = (2 * N .^ 3 - N .^ 2) ./ timeGPU / 1e9; | ||
+ | @printf( | ||
+ | " | ||
+ | maximum(gflopsCPU), | ||
+ | maximum(gflopsGPU) | ||
+ | ) | ||
+ | |||
+ | plot( | ||
+ | sizes, | ||
+ | gflopsCPU, | ||
+ | lw = 2, | ||
+ | legend = :topleft, | ||
+ | xaxis = (" | ||
+ | xlims = (10^3, 10^9), | ||
+ | frame = true, | ||
+ | label = string( | ||
+ | "CPU (Max: ", | ||
+ | round(maximum(gflopsCPU), | ||
+ | " GFLOPs @Xeon E5-2650v4)", | ||
+ | ), | ||
+ | ); | ||
+ | plot!( | ||
+ | sizes, | ||
+ | gflopsGPU, | ||
+ | lw = 2, | ||
+ | label = string( | ||
+ | "GPU (Max: ", | ||
+ | round(maximum(gflopsGPU), | ||
+ | " GFLOPs @GTX 1080Ti)", | ||
+ | ), | ||
+ | ); | ||
+ | plot!(yaxis = (" | ||
+ | plot!(title = (" | ||
+ | plot!(minorxgrid = true, ylims = :round); | ||
+ | scatter!( | ||
+ | [sizes[argmax(gflopsCPU)], | ||
+ | [maximum(gflopsCPU), | ||
+ | label = "", | ||
+ | marker = (10, 0.3, [:blue, :red]), | ||
+ | ); | ||
+ | savefig(" | ||
+ | </ | ||
+ | |||
+ | The job script is quite ordinary. In this example, we only use one GPU and start Julia with four threads. For this we request one process with four cpus for multithreading. | ||
+ | |||
+ | <file bash julia_gpu_perf_job.slurm> | ||
+ | #!/bin/bash | ||
+ | #SBATCH --account=< | ||
+ | #SBATCH --job-name=gpu_perf | ||
+ | #SBATCH --output=%x_%j.out | ||
+ | #SBATCH --error=%x_%j.err | ||
+ | #SBATCH --partition=m2_gpu | ||
+ | #SBATCH --gres=gpu: | ||
+ | #SBATCH --time=0-00: | ||
+ | #SBATCH --mem=8192 | ||
+ | #SBATCH --ntasks=1 | ||
+ | #SBATCH --cpus-per-task=4 | ||
+ | |||
+ | module purge | ||
+ | module load lang/ | ||
+ | module load vis/ | ||
+ | module load system/ | ||
+ | module load lang/Julia | ||
+ | |||
+ | julia --threads 4 gpu_perf.jl | ||
+ | </ | ||
+ | |||
+ | You can submit the job by simply executing: | ||
+ | |||
+ | <code bash> | ||
+ | sbatch julia_gpu_perf_job.slurm | ||
+ | </ | ||
+ | |||
+ | The job will be completed after acouple of minutes and you can view the output with: | ||
+ | |||
+ | |||
+ | <code bash> | ||
+ | cat gpu_perf*.out | ||
+ | </ | ||
+ | |||
+ | The Output should resemble the following lines: | ||
+ | |||
+ | <code bash> | ||
+ | Achieved peak calculation rates of 140.4 GFLOPS on CPU, 10134.1 GFLOPS on GPU | ||
+ | </ | ||
+ | |||
+ | The graphic generated in the script is shown below: | ||
+ | |||
+ | {{ : | ||