A Guide to Python Virtual Environments (2024)

Python virtual environments allow developers to control software dependencies in Python code. Software dependency means that a given piece of software requires one or more other software packages or libraries to run successfully. For example, TensorFlow, a deep learning library in Python, requires another Python package called NumPy to run.

Virtual environments are useful ways of ensuring that the correct package/library versions are consistently used every time the software runs. This helps isolate software projects from potentially conflicting libraries and packages that are installed on an operating system. Conflicting software packages may result in a piece of software failing.

Further, virtual environments ensure that the results from running code are reproducible. Reproducibility is an essential part of software development and software testing because it guarantees that software programs run and work as intended. If a software program’s results can’t be reproduced, it won’t be able to reliably perform the task it was developed to complete.

What Are Python Virtual Environments?

Python virtual environments allow developers to control software dependencies in Python code. They’re useful ways of ensuring that the correct package/library versions are consistently used every time the software runs. Virtual environments also help ensure that the results from running code are reproducible.

More From Sadrach PierrePython Class Inheritance Explained

When to Use a Virtual Environment

Working within a Python virtual environment is generally a good idea when developing Python code because they provide ways to control the behavior of a software program. If code is written to generate a specific output given a known input, the same output should be reliably generated for that input whether the code is run on my computer, a remote machine, or someone else’s laptop. Developing software without a virtual environment defined increases the risk of the code failing or producing undesired results.

Using virtual environments is even more important when you’re working with a larger Python development project that includes many libraries, files, and Python scripts. As code becomes more complex, bookkeeping software dependencies is even more complex as there is an increased risk of package mismanagement and, consequently, the existence of error-producing code.

If you’re a developer working on multiple Python projects in parallel, you should use distinct virtual environments for each project. This practice makes working on multiple projects more organized and also reduces the risks of code execution errors. Also, if you’re collaborating on a large Python development project, using virtual environments will make collaboration easier because collaborators can be confident that they’re using the right versions of each software library that is required for the code to run. Further, it makes debugging easier since it reduces the risk of bugs due to package mismanagement.

Benefits of Virtual Environments

Virtual environments have many benefits including easier dependency management and reduced risk of package conflicts and errors caused by software deprecation. Software dependency management is the process of bookkeeping software versions needed for a piece of software to run. Package conflicts arise when two or more software packages are incompatible, causing a piece of software to fail to run.Software deprecation occurs when a part or all of a software program or library changes or is removed.

To understand the importance of virtual environments, let’s consider the example of software deprecation.The statistical analysis library Pandas used to have a method called ix that was used to locate rows and columns by name or by index. Upon getting updated, the ix method was deprecated and replaced with new methods called iloc and loc. Consider a Python script that uses an old version of Pandas that uses the ix method to locate columns by name. Without the use of virtual environments, if an updated or newer version of Pandas runs this Python script, the code will fail to execute because, in a newer version of Pandas, the ix method no longer exists. With virtual environments, however, you can define the older version of Pandas that allows you to use the ix method without issue. This method means that the behavior of the code won’t change unexpectedly because the version of Pandas is being mismanaged.

Virtual environments also prevent system failures due to package conflicts. For example, certain versions of NumPy and TensorFlow conflict, which can result in system failures if you fail to consider package versions. This canmean unexpected performance issues or compatibility conflicts. Specifically, NumPy 1.20.0 has been reported to conflict with TensorFlow 2.5.0 due to the discrepancies in how each one defines matrix multiplication. If you install these two package versions without considering this conflict, you may get code execution errors. Conversely, with virtual environments, you can define versions of Tensorflow and NumPy that are compatible and avoid issues.

Using virtual environments makes dependency management more straightforward and also reduces the risk of package conflicts. Both of these benefits help ensure the reproducibility of code results. If you specify the exact versions of libraries and packages that the system requires, software updates won’t result in errors and bugs. Further, managing dependencies and minimizing package conflicts makes collaborating on software development much easier because each developer will have full knowledge of existing package requirements, which can be used to reliably inform how new package dependencies are incorporated.

Many packages in Python allow you to create and manage virtual environments. Conda is a commonly used one in data science. It is part of the Anaconda distribution, which is convenient since many data scientists and machine learning engineers likely also use Anaconda.

More in Data ScienceTop 20 Technical Interview Questions with Example Answers

Step-by-Step Guide to Python Virtual Environments

The workflow for creating and managing Python virtual environments is as follows:

Create Python virtual environment.
Install packages in Python virtual environment.
Activate Python virtual environment.
Deactivate Python virtual environment.

An interesting use case is creating and managing Python virtual environments for exploratory data analysis (EDA) tasks. For example, we can create a virtual environment for exploratory data analysis of the Synthetic Credit Card Transaction data available on DataFabrica. The data contains synthetic credit card transaction amounts, credit card information, transaction IDs and more. The free tier is free to download, modify, and share under the Apache 2.0 license.

1. Create Python Virtual Environment

To start, navigate to the Anaconda website and follow the instructions for installing Conda:

A Guide to Python Virtual Environments (1)

Next, follow the instructions for verifying your installer hash here. Once you verify your install hash, double-click the download .pkg file and follow the instructions to complete installation. You can test whether the installation was successful by entering the command conda list in a terminal window.

Next, to create a Conda virtual environment, we use conda create –name myenv, where “myenv” is the name of our environment. Since we will be creating an environment for performing EDA on credit card transaction data, let’s name our environment to reflect. Let’s create an environment called “cc_data_eda_env”:

conda create –name cc_data_eda_env

Upon running this command, we will see a message asking if we would like to proceed:

Proceed ([y]/n)?

Type y and press enter:

Proceed ([y]/n)? y

We should see the following output:

Preparing transaction: doneVerifying transaction: doneExecuting transaction: done## To activate this environment, use## $ conda activate cc_data_eda_env ## To deactivate an active environment, use## $ conda deactivate

2. Install packages in Python Virtual Environment

We see that the output gives instructions on how to activate and deactivate our newly created Conda environment. The environment we created is empty. We can also create the same environment where we specify the specific Python packages we would like installed in our environment. Plotly is one commonly used Python library for data visualization and EDA. Let’s create this environment again with the Plotly library installed. To do this, in the same Conda create command from earlier, we just put the name of the package we’d like to install after the name of our environment:

conda create –name cc_data_eda_env plotly

Upon running this code, we will be asked if we would like to remove the old version of the cc_data_eda environment. Type y and press enter. We will see a long message specifying all of the Plotly dependencies that will be installed in addition to Plotly itself:

A Guide to Python Virtual Environments (2)

To proceed with creating the environment, press y. We should see an output showing the installation progress for each package.

Finally, we can specify an environment.yml file to specify the list of packages we would like installed. This makes managing dependencies easier, especially when we require a large number of package installations. This is because, with an environment.yml file, you have an exhaustive list of all the packages and their versions, needed for your software to run, easily accessible. Without this, you would need to comb through a long list of installed packages on your machine, find the packages that your programs use and their respective versions. On a laptop or personal desktop computer, the packages that a software program uses are most likely a small subset of the long list of packages installed on a personal computer. This makes the task of managing software packages and their versions inefficient and in some cases not possible.

Let’s create an environment.yml file using the following command:

vi environment.yml

And let’s list the following package in our environment.yml file:

name: cc_data_eda_envchannels: - defaultsdependencies: - python=3.11.4 - pandas=1.5.3 - numpy=1.25.2 - matplotlib=3.7.1 - seaborn=0.12.2 - plotly=5.9.0

We can then create an new environment using our environment.yml file:

3. Activate Python Virtual Environment

Once we are finished creating the environment, we can activate it using the following command in terminal:

conda activate cc_data_eda_env

The name of the environment in parentheses should indicate the activated environment:

(cc_data_eda_env) user:user_directory user.name$

Now that our environment is activated, we can define some logic that uses these packages. To proceed, make sure the csv file is in the same directory as your yaml file. Next, let’s create a Python file called cc_eda.py:

vi cc_eda.py

In this file, let’s import each of the packages:

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as pltimport plotly

Next, let’s read our data into a Pandas DataFrame:

df = pd.read_csv()

Let’s display the first five rows of data:

print(df.head())

Now exit the Python file and run:

python cc_eda.py

A Guide to Python Virtual Environments (4)

Next, let’s perform some logic using NumPy methods. Let’s take the log transform of the transaction_amount and store the values in a new column called “log_transaction_amnt”:

df['log_transaction_amnt'] = np.log(df['transaction_amount'])print(df.head())

Then let’s rerun our Python script:

python cc_eda.py

A Guide to Python Virtual Environments (5)

Next, let’s generate a time series line plot of the transaction_amount overtime. This will require both Pandas operations and Matplotlib methods.

First, let’s format our date column:

df['transaction_date'] = pd.to_datetime(df['transaction_date'])

Filter our data to consider a single merchant:

df= df[df['merchant_name'] == "Cheesecake Factory"]

Remove outliers:

z_scores = np.abs((df['transaction_amount'] - df['transaction_amount'].mean()) / df['transaction_amount'].std())# Define a threshold (e.g., Z-score > 3) to identify outliersthreshold = 3# Remove outliers from the DataFramedf = df[z_scores < threshold]

Define our grouping column:

df['year'] = df['transaction_date'].dt.yeardf['year'] = df['year'].astype(str)df['month'] = df['transaction_date'].dt.monthdf = df[df['month'] <= 12]df = df[df['month'] >= 1]df['month'] = df['month'].astype(str)df['month_year'] = df['year'] + "-"+ df['month']df['month_year'] = pd.to_datetime(df['month_year'])

Perform a groupby sum on month_year:

df_grouped = df.groupby('month_year')['transaction_amount'].sum().reset_index()df_grouped = df_grouped.set_index('month_year').sort_index()df_grouped.index = pd.to_datetime(df_grouped.index)And plot our time series:plt.plot(df_grouped.index, df_grouped['transaction_amount'])

And plot our time series:

plt.plot(df_grouped.index, df_grouped['transaction_amount'])

And rerun:

python cc_eda.py

A Guide to Python Virtual Environments (6)

We can also add logic that includes Seaborn methods and operations:

data = df[['transaction_amount', 'log_transaction_amnt']]sns.pairplot(data)plt.show()

And rerun:

python cc_eda.py

A Guide to Python Virtual Environments (7)

Finally, we can generate a geomap of customer counts in the U.S. using Plotly express. The logic below uses a state name abbreviation mapping dictionary that is omitted for clarity:

df['merchant_state_abbr'] = df['merchant_state'].map(state_abbreviations)state_counts = df.groupby('merchant_state_abbr')['cardholder_name'].nunique().reset_index()state_counts.columns = ['State', 'Customer_Count']fig = px.choropleth(state_counts, locations='State', locationmode='USA-states', color='Customer_Count', scope='usa', color_continuous_scale='Blues', title='Number of Customers by State')fig.show()

A Guide to Python Virtual Environments (8)

It’s worth noting that we’re only able to use the Python packages we’ve specified in the requirements.txt file. For example, if we were to try to import and use a method in Scikit-learn, we would get a package not found error:

from sklearn.linear_models import LinearRegression

And when we run:

 from sklearn.linear_models import LinearRegressionModuleNotFoundError: No module named 'sklearn'

In order for this script to run successfully, we need to update our environment.yml file with the package name, sklearn, and the version:

name: cc_data_eda_envchannels: - defaultsdependencies: - python=3.11.4 - pandas=1.5.3 - numpy=1.25.2 - matplotlib=3.7.1 - seaborn=0.12.2 - plotly=5.9.0 - sklearn=1.30

We then need to deactivate our environment and create a new environment with the new environment.yml (you can choose a new name or keep the old name as long as the old environment has been removed).

4. Deactivate a Python Virtual Environment

Deactivating a Conda environment is straightforward. You simply enter the following command in the terminal:

conda deactivate

Upon running this command, you should no longer see the environment name in parentheses preceding the username and present working directory:

user:user_directory user.name$

Add Data Science SkillsBig O vs. Big Theta vs. Big Omega Notation Differences Explained

Use Python Virtual Enivornments

Python virtual environments are important tools for anyone developing code in Python. They make defining and managing code that depends on multiple software packages easier, as well as debugging code, preventing software conflicts, performing tests, and collaborating with other developers. Oftentimes, software bugs are due to deprecated software libraries, which can easily be detected in a managed virtual environment.

Further, some Python libraries conflict with each other, and knowing the exact versions of each library in use makes identifying software conflicts or incompatibilities easier. Further, once software code is developed, it often has to go through rigorous testing. This process requires that developers know the variables, such as library versions and that they stay constant. After all, if you’re testing to figure out what causes the software to break, you want to make sure that the case being tested is not due to library mismanagement. All of these factors make software development collaboration much easier, which ultimately leads to faster code development overall. Understanding Python virtual environments is an invaluable skill for anyone who uses Python programming in their day-to-day work.

The code used in this post is available on GitHub.A free sample of the data used in this article is available here. The full data set can be found here.

A Guide to Python Virtual Environments (2024)

FAQs

Which virtual environment is best for Python? ›

TLDR: There are three main options for creating and managing virtual environments in Python: pipenv , conda and venv . pipenv is superior to venv and conda . Learn to use pipenv first and you will not have to waste time learning the other two.

Tell Me More ›

What is the best way to create a virtual environment in Python? ›

Getting Started

Create a virtual environment in your current directory for a project with the command: virtualenv my_project. "my_project" is whatever name you would like to give this environment.
To create a virtual environment with a specific version of python use the command: virtualenv -p /usr/bin/python2.7 my_project.

Know More ›

How do I organize my Python virtual environment? ›

Setting up python Virtual environment for your project

Step 1: Create a new directory for your project. ...
Step 2: Navigate to the project directory. ...
Step 3: Create a virtual environment using venv. ...
Step 4: Activate the virtual environment. ...
Step 5: Install project dependencies. ...
Step 6: Deactivate the virtual environment.

Apr 29, 2023

Get More Info ›

What are Python virtual environments? ›

A Python virtual environment (venv) is simply a directory with a particular file structure. It has a bin subdirectory that includes links to a Python interpreter as well as subdirectories that hold packages installed in the specific venv.

Tell Me More ›

Should I use conda or venv? ›

Choosing the right environment management tool depends on your needs. If you need a simple, easy-to-use tool, venv might be the best choice. If you're dealing with complex dependencies, Conda env is the way to go. If you need to switch between different Python versions, consider pyenv or virtualenv.

Know More ›

Is virtualenv deprecated? ›

Multiple Python projects, each with their own dependencies can be effectively managed using separate virtual environments. Pipenv, Venv, and Pyenv are popular dependency management tools. The ActiveState Platform is a modern solution to dependency management. Virtualenv has been deprecated in Python 3.8.

Get More Info Here ›

Should I create a virtual environment for every Python project? ›

If you're a developer working on multiple Python projects in parallel, you should use distinct virtual environments for each project.

Get More Info ›

What is the difference between venv and virtualenv? ›

venv is a built-in module in Python 3.3 and later versions that allows you to create isolated Python environments. It is similar to virtualenv , but it is installed by default with Python. pyvenv is a script that comes with Python 3.3 and later versions that allows you to create virtual environments.

Read On ›

Should I create a virtual environment for each Python project? ›

Any time you're working on a Python project that uses external dependencies that you're installing with pip , it's best to first create a virtual environment: Windows. Linux. macOS.

Show Me More ›

Where should I store Python virtual environment? ›

There are several conventions as to where to place environments, but one common approach is to put it alongside the source folder for a program. For example, a project might be checkout out to ~/myproject/project-src. If following the convention, the environment would be placed at ~/myproject/env.

Keep Reading ›

What is the point of virtual environments? ›

A virtual environment is a tool that helps to keep dependencies required by different projects separate by creating isolated Python virtual environments for them. This is one of the most important tools that most Python developers use.

Get More Info ›

Where should Python virtual environment be located? ›

The virtual environment tool creates a folder inside the project directory. By default, the folder is called venv , but you can give it a custom name too. It keeps Python and pip executable files inside the virtual environment folder.

What is an example of a virtual environment? ›

Email, chat, and web-based document sharing applications are all examples of virtual environments. Simply put, it is a networked common operating space.

Show Me More ›

Which Python module helps you create virtual environments? ›

The module used to create and manage virtual environments is called venv . venv will usually install the most recent version of Python that you have available. If you have multiple versions of Python on your system, you can select a specific Python version by running python3 or whichever version you want.

Is Anaconda a Python virtual environment? ›

Python is a versatile language widely used in data science, machine learning, and web development. To manage Python packages and dependencies, virtual environments are essential. Two popular tools for creating virtual environments are venv and Anaconda .

Keep Reading ›

Which is better, venv or virtualenv? ›

These are almost completely interchangeable, the difference being that virtualenv supports older python versions and has a few more minor unique features, while venv is in the standard library.

Read On ›

Why is poetry better than conda? ›

Poetry emerges as a modern and organized solution for Python dependency management, offering improved organization, version control, and flexibility compared to traditional tools like Pip and Conda.

Should I create a virtual environment for a Python project? ›

Python virtual environments allow developers to control software dependencies in Python code. They're useful ways of ensuring that the correct package/library versions are consistently used every time the software runs. Virtual environments also help ensure that the results from running code are reproducible.

Read The Full Story ›