Integrate Spark with Jupyter Notebook and Visual Studio Code

Updated On March 30, 2021 | By Mahesh Mogal

In last blog we had set up spark on our machine. We can access spark from console or command prompt. But when we are working, this is not the ideal way. It won't save our commands, fixing errors in console is much difficult, and it does not have any intellisense.

That is why, in this blog, we are going to learn how to use spark with Jupyter notebook. We can use Jupyter notebooks from Anaconda or we can use them inside Visual studio code as well. I like to use visual studio code as its lightweight, have a lot of good extensions and we do not need another IDE just for working with notebooks. So let's get started.

Setting Up Visual Studio Code

First thing we will need is visual studio code installed on our machine. It is free to download and easy to set up. You can download it from this link.

Once you have installed visual studio code, open it and search for python extension and install it.

vsc-python-extension
visual studio code - Python extension

Required Python Packages

Next thing will be to install required python packages on our system. For this, we can use pip. There are two packages that we need to install.

  • jupyter - this package will help us use jupyter notebooks inside visual studio code.
  • findspark - this package will help us Spark installed on our machine to integrate with jupyter notebooks.

We can install both packages using command below.

Starting Jupyter Notebook In Visual Studio Code

We can now work with notebooks in visual studio code. For that, open your visual studio code and press "CTRL + SHIFT + P". This will open command pallet. Search for create notebook.

python-create-notebook
python-create-notebook

This will start our notebook.

For using spark inside it we need to first initialize findspark. We can do that using below code.

Now we can create spark session to use for our work.

python-jupyter-notebook-with-spark
python-jupyter-notebook-with-spark

Conclusion

We have set up visual studio code and jupyter notebooks to use them with Spark. This development environment will save you a lot of time and easy to use when working with Spark. I hope you have found his useful. If you have questions, let me know. See you in the next blog.

Mahesh Mogal

I am passionate about Cloud, Data Analytics, Machine Learning, and Artificial Intelligence. I like to learn and try out new things. I have started blogging about my experience while learning these exciting technologies.

Stay Updated with Latest Blogs

Get latest blogs delivered to your mail directly.

Recent Posts

Spark Join Types With Examples

In this blog, we are going to learn different spark join types. We will also write code and validate data output for each join type to better understand them.

Read More
Integrate Spark with Jupyter Notebook and Visual Studio Code

In this blog, we are going to integrate spark with jupyter notebook and visual studio code to create easy-to-use development environment.

Read More
Reading Data From SQL Tables in Spark

In this blog, we are going to learn about reading data from SQL tables in Spark. We will create Spark data frames from tables and query results as well.

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram