ML Project Environment Setup in Julia, a Comprehensive Step-by-step Guide

When you start with machine learning, if you opt to run your ML project code locally on your machine rather than using online services such as Google Colab, one of the very first technical things to do is to set up a project-specific environment.

Today, we will go through the ML environment setup guide in Julia step-by-step. I share the same guide for Python on a separate page.

When you write your code, do you already organize it by setting up separate coding environments for different projects? Or you may have heard about it but never took the time to learn what it is, why you need it, and how to set it up and adjust your workflow accordingly.

Up until recently, I had never bothered with setting up project environments. I worked in the global environment (also called a shared environment, shared by all projects) all the time—until it backfired and I got some errors, which I could not figure out how to fix.

I put off learning about proper code organization for way too long. I thought it would take me a considerable amount of time, and time is always the biggest and most important variable in my life as a busy graduate physics student, mom, and content creator.

But once I had no other choice and faced this issue head-on, it only took me a day to learn about it, understand what I was supposed to do, and set it up properly for the project I am currently working on (not ML project related but for my research).

Cleaning up the global environment and setting up a separate project environment fixed those stubborn errors in my code!

Global vs Local ML Project Environment Setup

When you install Julia for the first time, a global environment is created. It is activated by default every time you write or run your code unless you switch to another environment (provided you have it).

Any packages that you install or uninstall affect the global environment and all programs that you run within it.

In both Julia and Python, it is best practice to create workspace-specific environments, for example, by using a local environment.

The main distinction is that in the global (or shared) environment, you typically install your main version of the programming language along with some general utility libraries, which are useful for most projects. In the local environments, you typically install only project-specific libraries and maybe also a specific version of the programming language you are using.

Benefits of Using Environments

Here are some of the obvious benefits of using environments to give you an idea of why we do this.

  • Better code organization.
  • There will be fewer dependency issues and fewer error messages (for example, maybe you need different versions of the same package for different projects, or when libraries get updated, this occasionally breaks your code if your program is highly dependent on this specific package version).
  • Code share-ability (coworkers, GitHub) (it is possible to recreate the exact environment for your code in which it was written).
  • More efficient use of the system’s resources.

But before we go ahead and set up the environment, we need to answer a couple of questions first.

  1. Will you be using CPUs or GPUs to perform calculations for your project?
  2. Which GPU architecture you have access to, if you need GPUs?

Hardware Requirements for ML

I am bringing this up here for the above-mentioned two reasons before discussing ML environment setup.

Let’s briefly discuss hardware requirements first. Depending on your ML project and the size of your training data sets, you might need extensive calculations, which usually involve matrix multiplications.

There are two ways you can do these- multiplications: do them one after the other or do them all at the same time, which is faster, of course.

You might get away with using just CPUs for smaller projects with smaller data sets. That way, you can do your calculations sequentially (though Julia also offers some level of parallel computation on CPUs via multithreading to speed things up a bit).

For larger projects with a ton of training data, you will need GPUs or even clusters of GPUs to do your calculations and accelerate the training of your machine learning models. That is why it is beneficial to have access to a GPU, even if you are just starting to learn ML so that you can get familiar with how to use it from the start.

GPU Type

There are three most popular types:

  • NVIDIA GPUs
  • AMD GPUs
  • Apple M-series GPUs
  • Intel GPUs

But why do we care about the type?

For example, to use TensorFlow (TF) on a GPU, you need to install the CUDA toolkit. However, CUDA was developed by NVIDIA and is specifically designed to work with NVIDIA GPUs. In other words, to successfully use TF, you need an NVIDIA GPU.

However, there are also AMD GPUs and Apple M-series GPUs. I have the latter. What do we do now? Can we still use TensorFlow on those architectures?

If you have one of those, you will need to take one extra step to set up your environments for ML.

(If you do not need GPUs for your computations, you can skip this step).

Setup on Apple M-series GPUs

I am personally using a MacBook Pro with the following specs:

  • 16GB of RAM
  • 10-core CPU with 8 performance cores and 2 efficiency cores
  • 16-core GPU
  • 16-core Neural Engine ( a group of special cores specifically for ML tasks and AI operations)
  • macOS 14 (Sonoma)

Now let’s learn how to set up an environment in Julia on an Apple Silicon M1 computer.

We use language-specific package managers to create and manage environments.

I already have Julia installed on my MacBook along with its Pkg built-in package manager. But if you need to install it, you can go to https://julialang.org/downloads/ and https://julialang.org/downloads/platform/.

Open the terminal and then:

  • mkdir TestProjects to make a directory for your Julia Projects
  • cd TestProjects to go to that newly created directory
  • julia to enter the Julia REPL and ] to access it Pkg package manager mode.
  • (@v1.9) pkg> generate NewJuliaProject create a new project environment (creates src/ subfolder that holds your main code: modules and other files) This is what it looks like when you are done:
  • ; to enter back into the shell mode
  • shell> mkdir -p NewJuliaProject/{docs,test} to create remaining useful folders. For example, we can generate docs files automatically from the docstrings in out code.

Now you will have a folder structure similar to this:

At this point, we successfully created an environment and the minimum required folders to organize our code and can start adding project-specific libraries to it.

Step 3: Activate Environment

We need to activate the new environment first to add any packages to our new environment.

  1. Make sure you are in your new project directory. If not, cd into it.
  2. Switch to pkg mode with ] and type activate . This will take you from the global (@1.9 pkg>) (1.9 is the Julia version you’ve installed) environment to the local environment (NewJuliaProject) pkg>. Notice there is now no @ prefix before the environment name. This means this is no longer a shared environment. But we are still in the pkg> mode.

For now, this is an empty project with no libraries. You can check which dependencies are installed in the project by typing status or st in the >pkg mode.

Step 4: Install Needed Packages

Now that the environment is active, we can add needed packages.

Generally, I would say I will for sure need the following packages for my ML project environment setup:

JuliaPython equivalent
DataFrames.jlPandas
LinearAlgebra.jlNumpy
Plots.jlMatplotlib
ScikitLearn.jlSciKit-Learn
Flux.jlPytorch, TensorFlow

As soon as you any libraries or packages to your project, a Manifest.toml file will be created, which automatically adds information about the specific details of the dependency you installed, its version, etc. This file is crucial if someone wants to run your code and be able to reproduce the exact environment in which you wrote your code. It contains much more detailed information about the specific packages installed compared to the Project.toml file.

THE extra step I was talking about for GPU support

To enable GPU access, we need to install Metal.jl package.

With Metal.jl it’s possible to program GPUs on MacOS using the Metal programming framework from Apple.

Step 5: Using The Environment Inside IDE

After you have completed the ML environment setup, activated it, and installed the needed packages, we should ideally have GPU access as well. We can then start writing our code. However, you usually need an IDE for that. I am using the VS Code editor.

To use VS Code for Julia code development, we need:

  1. install Julia language support extension for VS Code
  2. install Jupyter Notebook support extension for VS Code
  3. install IJulia package to use Jupyter Notebook with Julia kernel in VS Code
  4. activate project-specific environment inside VS Code
Install VS Code Extensions
  1. Open VS Code (if installed already, if not, you can get it at https://code.visualstudio.com/)
  2. Go to the extensions tab and search for Julia language support and then Jupyter Notebook support . Install the extensions.
Install IJulia Package

In my opinion, using Jupyter Notebook is the best way to develop your code for ML projects and beyond ML too.

IJulia package makes it possible to use Jupyter Notebook with Julia kernel instead of Python kernel.

Open the terminal, access Julia REPL

From the julia> mode in the global environment (because it will be useful for most of our project environments, so it makes sense to make it available everywhere).

julia> using Pkg
julia> Pkg.add("IJulia")
Julia

or from the pkg> mode

julia> ]
(@1.9) pkg> add IJulia
Julia

and then exit the Julia REPL

julia> exit()
Julia
Activating Environments Inside VS Code Editor

When you open VS Code and the active directory inside of it contains a Project.toml file, you will be prompted to switch from the global environment to this project environment.

To activate your ml project environment (or any other), click on the Julia env: tab in the blue bar below to select the desired environment.

Deactivate Environment

Inside VS Code, you can switch back to the global environment or close VS Code when you are done working.

Inside Julia REPL, to deactivate or leave a specific environment, simply type activate , and it will take you back to the global/default environment.

You need to activate the project-specific environment every time you work on a project and deactivate it when you are done.


Leave a Comment

Your email address will not be published. Required fields are marked *