Setup Python Environment for Data Science and Machine Learning on Visual Studio Code.

A definitive environment for data science and machine learning

Vatsal Bajaj
4 min readMar 9, 2021

Well, if you are in Data Science or Machine Learning industry you will probably know that setting up a definitive environment on a PC from scratch is a mess.

I personally feel if you are a beginner then you can face problems like creating a custom virtual environment or configuring your GPU with TensorFlow.

Most of the steps given here can also be done on your favorite editor, but I will stick to VS Code for this.

Well Data Science is a broad field you can wander from Statistics to Data Analysis to Machine learning to AI and much more. But this article is a bit more inclined towards Machine Learning.

Hardware Prerequisites-

If you don’t have a GPU don’t worry you can also run your ML algorithms on just CPU.

  1. CPU — A laptop or desktop PC - having CPU Intel Core i5 5th Gen (or higher) or with equivalent power.
  2. RAM — 8 GB is minimum, but 16 GB or higher will benefit you.
  3. (Advanced Users) GPU Requirements— NVIDIA GPU with 8 GB or 16 GB of RAM ,NVIDIA GPU Driver 450.0 and cuDNN SDK
  4. Operating System — MS Window 10 (x64 bit). Update your OS to latest version.

Overview of Table of Contents:-

  1. Setting Up Python or Anaconda.
  2. Installing Visual Studio Code and Extensions.
  3. Making a Custom Virtual Environment and Installing libraries and frameworks.
  4. (Only if you have CUDA installed)- Configuring TensorFlow to use GPU and CUDA

So let’s get started

Step 1 — Installing Python

First and foremost thing to do is installing the python3, I will recommend installing python 3.6 or 3.7 or LTS version.

Switching to the latest version will have bugs or doesn’t support all of your favorite libraries, but python has an active community, it will be resolved in no time. But go with 3.6 or 3.7 version

While installing make sure python is added to path. Make sure you tick the checkbox “Add Python 3.x to PATH”. If not it would cause errors while running your scripts or notebooks.

For those who have installed the python but don’t know the python is added to the path. Need help go here.

OR Manually add python to path-

  1. Open System Variables through control panel.
  2. Search for path in the list and click on edit button.
  3. Enter your path of the python program (path will look like this C:/Users/<username>/AppData/Local/Programs/Python/Python37/python.exe) OR your custom location.

All done for the first step. Let’s move forward.

Step 2 — Installing Visual Studio Code and Extensions

Go to Microsoft’s Visual Studio Code website here and download the latest version from the site.

You can leave all the settings to default or if you are comfortable with advanced settings install it according to your preferences

The extensions which we are using are-

  1. Python (Python Language Server)
  2. Jupyter
  3. Visual Studio IntelliCode
  4. Pylance — Preview available as of now for getting autocompletion and suggestions. Both python scripts and IPython notebooks are supported

That’s It…

Step 3 — Creating Custom Virtual Environment and Basic Packages & Modules

It looks like things are going well, now it's time to set up your custom virtual environment through the terminal.

The first thing we are going to do is to make a custom virtual environment, it is necessary because you can create and use an isolated environment for different projects without configuring other dependencies and packages for your projects.

  1. Make a new folder and Open it through VS Code.
  2. Open terminal or just press Ctrl+J
  3. Type the following command to install virtualenv through command line or PowerShell.
python -m pip install --user virtualenv

2. Create the virtual environment

python -m venv /path/to/new/virtual/environment

Basic Libraries and Packages —

  1. NumPy
  2. Pandas
  3. SciPy
  4. Scikit learn
  5. TensorFlow
  6. Keras
  7. PyTorch
  8. Matplotlib
  9. Seaborn
  10. Plotly

Wait there are much more libraries and packages available, but these are the one I recommend.

But it is highly recommended installing them one by one to prevent you from creating a mess:

One and all command to install all the above packages with their latest version. Remember to activate your virtualenv before executing the commands

pip install -U numpy pandas scipy scikit-learn tensorflow Keras torch matplotlib seaborn plotly

Step 4 — Configuring TensorFlow

Disclaimer: Recommended for those who have a high-end specification laptop.

Well it's the last and most exiting part of the whole blog. Configuring TensorFlow with NVIDIA GPU for machine learning and deep learning.

First of you must ensure that you have a CUDA-capable GPU with enough RAM.

If you don't have the prerequisite, download the drivers and toolkit from the links given below-

  1. GPU Driver
  2. CUDA Toolkit
  3. cuDNN SDK — You have to sign up to download the cuDNN toolkit.

This a big and a time-consuming process so make sure if you stuck at any point you can refer to the Nvidia and TensorFlow websites.

Now comes the part of configuring TensorFlow with your python environment

If you have installed CUDA with all the defaults you can easily set the environment paths with the following commands with the terminal.

SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\CUPTI\lib64;%PATH%
SET PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include;%PATH%
SET PATH=C:\tools\cuda\bin;%PATH%

Now it’s all set.

Happy Machine Learning !! 😉

--

--