How to Install Spark On Windows

Apache Spark is one of most popular data processing tools. It has multiple useful libraries like streaming, machine learning, etc. In this blog we are going to learn how to install spark on windows. It is a common misconception that spark is a part of Hadoop ecosystem and it needs Hadoop installed to to work with Spark. We will see that how easy it is to set up spark on windows and use it for practise.

Java Virtual Machine

Before we start, we will need to make sure we have java set up on our machine. This is necessary as Spark needs JVM to run. We can check that if Java is installed or not by running below command in Powershell.

If you do not have java installed on your windows machine, you can follow one of method below.

Using Chocolatey to Install Java

You can use chocolatey package installer for windows to set up Java on your machine. All you need to do is run below command in power shell.

you can get more details about this at below links.

Downloading Java from Oracle site

You can also download latest version of java from oracle website and install it on windows. You can get java at this link https://www.java.com/en/download/. Once you have an installer, just execute it and it will set up java on your machine.

Once installation is complete by either of ways, check java version using command mention above. If you get output with some version, all is good. If you do not get any output or get an error, check if you have JAVA_HOME set up in your environment variables.

Download Spark

Now we can download spark from apache spark website. You can choose which spark version you need and which type of pre-built Hadoop version it comes with.

spark download options
spark download options

Setting Up Spark On Windows

Once your download is complete, it will be zip file. You can unzip that file which will have Spark code.

Now we can place this code anywhere on our windows system. I like to create spark directory under C drive and place code there.

Spark set up on windows
Spark set up on windows

Setting Up WinUtils for Hadoop version

Next thing we will need is win utils file. This will trick spark into thinking Hadoop is installed on this machine. You can download “winutils” at this GitHub repository.

Once you have this exe file, create another directory in C drive with name hadoop, inside that create bin directory and put this exe file inside C:/hadoop/bin path.

spark-winutils
spark winutils set up

Setting Up Environment Variables

Last thing we need to do is set up environment variables for Spark Home and Hadoop Home so that we can access spark from anywhere.

To set this up, search environment variables in windows start menu. Once environment box is open, go to “Path” variable for your user.

spark-user-path-variable
spark-user-path-variable

Select and edit this path variable and add below two lines to it. If you have placed spark code and winutils in a different directory, change file paths below.

C:\spark\bin

C:\hadoop\bin

spark-path-set-up
spark-path-set-up

Check Installation Status

If you have come this far and done all steps correctly, We should be able to use Spark form power shell. To check this try running “spark-shell” or “pyspark” from windows power shell. If you get output with spark version, all is good and you can start working with Spark from your own machine.

spark-pyspark
Pyspark set up check from powershell

When you launch spark, you can check spark job status at http://localhost:4040/.

Conclusion

In this article, we have learned how to set up spark on windows. This was much easier than expected and we have Spark running in few mins. We will use this Spark set up to learn and practise in future blogs. I hope you have found this useful. See you soon.

Similar Posts

Leave a Reply

Your email address will not be published.