17 C
Jaipur
Sunday, November 1, 2020

How to Install Apache Spark on Ubuntu 20.04

Must read

Best Buy Black Friday Sony WH-1000XM4 deals end tomorrow – hurry for lowest price yet on top headphones

Best Buy's first wave of Black Friday deals must come to an end tomorrow, which means this excellent price on the Sony WH-1000XM4 headphones...

toyota: Toyota to invest $500 million in KDDI to deepen ties for “connected car” – Latest News

Toyota Motor Corp and KDDI Corp said on Friday that Japan's biggest automaker would invest 52.2 billion yen ($500 million) in the mobile carrier...

This $299 laptop is one of the best Black Friday deals we’ve seen yet – but it ends tomorrow

Granted, we're only just getting started in this year's Black Friday laptop deals, but Best Buy's offerings this weekend are looking particularly sharp. One...

Can the Honor Watch ES smartwatch give a lazy person a six-pack? I found out

Many smartwatches come with a wide range of fitness modes, helping the wearer stay fit and active; however, for all the clever tech in...

[*]

Apache Spark is an open-source framework and a general-purpose cluster computing system. Spark provides high-level APIs in Java, Scala, Python and R that supports general execution graphs. It comes with built-in modules used for streaming, SQL, machine learning and graph processing. It is capable of analyzing a large amount of data and distribute it across the cluster and process the data in parallel.

In this tutorial, we will explain how to install Apache Spark cluster computing stack on Ubuntu 20.04.

Prerequisites

  • A server running Ubuntu 20.04 server.
  • A root password is configured the server.

Getting Started

First, you will need to update your system packages to the latest version. You can update all of them with the following command:

apt-get update -y

Once all the packages are updated, you can proceed to the next step.

Install Java

Apache Spark is a Java-based application. So Java must be installed in your system. You can install it with the following command:

apt-get install default-jdk -y

Once the Java is installed, verify the installed version of Java with the following command:

java --version

You should see the following output:

openjdk 11.0.8 2020-07-14
OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu120.04)
OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)

Install Scala

Apache Spark is developed using the Scala. So you will need to install Scala in your system. You can install it with the following command:

apt-get install scala -y

After installing Scala. You can verify the Scala version using the following command:

scala -version

You should see the following output:

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Now, connect to the Scala interface with the following command:

scala

You should get the following output:

Welcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.8).
Type in expressions for evaluation. Or try :help.

Now, test the Scala with the following command:

scala> println("Hitesh Jethva")

You should get the following output:

Hitesh Jethva

Install Apache Spark

First, you will need to download the latest version of Apache Spark from its official website. At the time of writing this tutorial, the latest version of Apache Spark is 2.4.6. You can download it to the /opt directory with the following command:

cd /opt
wget https://archive.apache.org/dist/spark/spark-2.4.6/spark-2.4.6-bin-hadoop2.7.tgz

Once downloaded, extract the downloaded file with the following command:

tar -xvzf spark-2.4.6-bin-hadoop2.7.tgz

Next, rename the extracted directory to spark as shown below:

mv spark-2.4.6-bin-hadoop2.7 spark

Next, you will need to configure Spark environment so you can easily run Spark commands. You can configure it by editing .bashrc file:

nano ~/.bashrc

Add the following lines at the end of the file:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Save and close the file then activate the environment with the following command:

source ~/.bashrc

Start Spark Master Server

At this point, Apache Spark is installed and configure. Now, start the Spark master server using the following command:

start-master.sh

You should see the following output:

starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-ubuntu2004.out

By default, Spark is listening on port 8080. You can check it using the following command:

ss -tpln | grep 8080

You should see the following output:

LISTEN   0        1                               *:8080                *:*      users:(("java",pid=4930,fd=249))   

Now, open your web browser and access the Spark web interface using the URL http://your-server-ip:8080. You should see the following screen:Advertisement

Start Spark Worker Process

As you can see, Spark master service is running on spark://your-server-ip:7077. So you can use this address to start the Spark worker process using the following command:

start-slave.sh spark://your-server-ip:7077

You should see the following output:

starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-ubuntu2004.out

Now, go to the Spark dashboard and refresh the screen. You should see the Spark worker process in the following screen:

Apache Spark Worker

Working with Spark Shell

You can also connect the Spark server using the command-line. You can connect it using the spark-shell command as shown below:

spark-shell

Once connected, you should see the following output:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.11-2.4.6.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/08/29 14:35:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ubuntu2004:4040
Spark context available as 'sc' (master = local[*], app id = local-1598711719335).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 2.4.6
      /_/
         
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.8)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

If you want to use Python in Spark. You can use pyspark command-line utility.

First, install the Python version 2 with the following command:

apt-get install python -y

Once installed, you can connect the Spark with the following command:Advertisement

Advertisement

pyspark

Once connected, you should get the following output:

Python 2.7.18rc1 (default, Apr  7 2020, 12:05:55) 
[GCC 9.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.11-2.4.6.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/08/29 14:36:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /__ / .__/_,_/_/ /_/_   version 2.4.6
      /_/

Using Python version 2.7.18rc1 (default, Apr  7 2020 12:05:55)
SparkSession available as 'spark'.
>>> 

If you want to stop Master and Slave server. You can do it with the following command:

stop-slave.sh
stop-master.sh

Conclusion

Congratulations! you have successfully installed Apache Spark on Ubuntu 20.04 server. Now you should able to perform basic tests before you start configuring a Spark cluster. Feel free to ask me if you have any questions.

[*]Source link

- Advertisement -

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article

Best Buy Black Friday Sony WH-1000XM4 deals end tomorrow – hurry for lowest price yet on top headphones

Best Buy's first wave of Black Friday deals must come to an end tomorrow, which means this excellent price on the Sony WH-1000XM4 headphones...

toyota: Toyota to invest $500 million in KDDI to deepen ties for “connected car” – Latest News

Toyota Motor Corp and KDDI Corp said on Friday that Japan's biggest automaker would invest 52.2 billion yen ($500 million) in the mobile carrier...

This $299 laptop is one of the best Black Friday deals we’ve seen yet – but it ends tomorrow

Granted, we're only just getting started in this year's Black Friday laptop deals, but Best Buy's offerings this weekend are looking particularly sharp. One...

Can the Honor Watch ES smartwatch give a lazy person a six-pack? I found out

Many smartwatches come with a wide range of fitness modes, helping the wearer stay fit and active; however, for all the clever tech in...

France vs Ireland live stream: how to watch Six Nations 2020 rugby anywhere today

We're expecting drama in abundance at the Stade de France tonight, with Ireland the only side who have their fate completely in their own hands...