Setting up Your Computer for Scientific Computing

This guide should be updated...

This guide is an attempt to gather some information for setting up your computer for scientific computing. This guide aims to be general rather than comprehensive. For a course that teaches you how to use these tools effectively, please see The Missing Semester of Your CS Education.

Note that general information is interspersed with advice specific to individual operating systems. This was a conscious choice to show how much of this content is (or is not) applicable across systems. Some of you may find that you need to use several operating systems (especially if using computing clusters), in which case it can be helpful to compare more directly.

Please note that the advice given here is the result of the authors collective workflows and knowledge over the years; this means that it is not always the most up-to-date method. If you are aware of a better approach, we would of course be interested in learning and improving this document. Don't forget: it is likely that you know your own machine and software better than we do.

Basic Tools

Terminal Emulator

"Terminal emulators" (terminals) are applications that give you a text-based interface with your computer. This text-based interface is called the "command line," where you can execute commands within a "shell" (the name for the program running in the terminal).

While not all scientific computing happens in a terminal, a significant amount does, and it can be a great tool once you're familiar with it. In fact, even if you don't primarily use a terminal, it's almost guaranteed that you will use it to some extent; therefore, it's worth at least becoming comfortable with the terminal.

In principle, any terminal is going to be more or less the same; we will walk through how to get started and what options are available below.

macOS

macOS comes with a preinstalled terminal, appropriately named Terminal.app. Terminal.app will more than suffice for your computing needs with many modern features (e.g., tabs) and is easily customizable. If you want to explore alternative terminals on macOS, some to consider are iTerm2, Hyper, Kitty, and Alacritty

Note that on more recent version of macOS, you may have to grant "Full Disk Access" to your terminal. This can be done by going into System Preferences → Security & Privacy → Privacy → Full Disk Access and checking the box next to your terminal application. You may also need to grant the terminal access to the directories listed in System Preferences → Security & Privacy → Privacy → Files and Folders.

Windows 10

If you anticipate using computational tools heavily, you may want to strongly consider using the Windows Subsystem for Linux (WSL). This provides a Linux shell accessible within your user space on Windows and will allow you to run software made for Linux (which includes a large number of scientific software). A number of Linux distributions are available for WSL; if you have no or limited prior experience with Linux, Ubuntu is typically a good place to start.

Windows comes with a preinstalled terminal, cmd.exe. While you can make it work, cmd.exe is a bit rudimentary in terms of features and customizability. Fortunately, Microsoft is aware of this and has recently released Windows Terminal, which includes many modern features and is easily customizable; it is also a very good option for integrating with PowerShell and WSL. If you want to explore alternative terminals on Windows, some to consider are cmder, Hyper, and Alacritty

Linux

Any distribution of Linux will provide some access to a terminal, whether it's via tty or with an application included in the desktop environment. One of the most popular Linux desktop environments is Gnome (the default desktop environment for Ubuntu, Fedora, etc.), which includes Gnome Shell. Whether Gnome Shell or a different terminal provided by your distribution, you will likely be more than fine with the default terminal. If you want to explore alternative terminals on Linux, a fairly comprehensive list can be found on the ArchWiki.

Chrome OS

To get a terminal on Chrome OS, you will first need to setup something called a "Linux Container." To setup Linux on a Chromebook, follow the Chrome OS developer instructions (the guide here may be more accessible). Note that this Linux container is based off of a distribution of Linux called Debian, which also serves as the basis for other popular Linux distributions including Ubuntu. The Chrome OS Linux container includes a Terminal, which will give you everything you need going forward.

Text Editor

Some of the most useful programs are what is known as "text editors." These text editors are unlike Microsoft Word, Pages, or Google Docs (which are referred as "What You See is What You Get", or WYSIWYG, editors) in that they directly edit the contents of a file (that is, they work with "raw" text data). This is important because software is typically written as raw text (with file extensions simply denoting the language of the program; e.g., .py for Python scripts), and data from experiments are typically stored as raw text (unless you are working with large amounts of data, in which case it is typically compressed and looks like gibberish when viewed with a text editor).

Text editors typically come in three varieties: command line text editors, graphical text editors, and integrated development environments. Command line text editors run directly inside your terminal; they are idiomatic in a terminal-based workflow, but can be difficult to use effectively at first (that said, the payoff is great if you do become familiar with them). Graphical text editors are more "modern" in that they run as a graphical program similar to most applications; these will likely feel more natural for those who were born after 2000. Integrated development environments (IDEs) are graphical programs which include a text editor among many other tools related for software development; they can be overwhelming at first and many of their features go beyond what is typically needed by scientists of their computer programs; note that most IDEs are tailored for specific programming languages (e.g., Python).

The easiest option for any of the operating systems below is to use a graphical text editors, as these will not have the steep learning curve typically associated with command line text editors while not being as overwhelming as IDEs; indeed, they work excellently for most people that do scientific computing.

macOS

macOS comes with several command line text editors by default, including nano, vim, and emacs. nano is the easiest to use but the least "powerful" (it is fairly limited in its functionality). vim and emacs are both difficult to learn but can be extremely useful once you are reasonably familiar with either of them; their users are in a friendly rivalry. A more modern terminal-based text editor is micro, which combines the ease-of-use of nano with some of the power and flexibility of vim and emacs.

While the preinstalled application TextEdit.app is capable of editing files, it lacks any features that are useful for writing code. There are many great graphical text editors available for macOS, including Atom, VS Code, and Sublime Text.

One of the most popular Python IDEs is PyCharm.

Windows 10

While Windows does not come with any command line text editors preinstalled, many can be downloaded for use with Windows Terminal or are included as part of WSL. Some of the more popular command line text editors are vim and emacs, which can be difficult to learn but extremely useful once you are reasonably familiar with either of them. A more modern terminal-based text editor is micro, which combines the ease-of-use of a rudimentary text editor, nano, with some of the power and flexibility of vim and emacs.

While the preinstalled application Notepad.app is capable of editing files, it lacks many of the features that are useful for writing code. There are many great graphical text editors available for Windows, including Atom, VS Code, and Sublime Text.

One of the most popular Python IDEs is PyCharm.

Linux

Most Linux distributions come with several command line text editors by default, often including nano, vi/vim, and emacs. nano is the easiest to use but the least "powerful" (it is fairly limited in its functionality). vim and emacs are both difficult to learn but can be extremely useful once you are reasonably familiar with either of them; their users are in a friendly rivalry. A more modern terminal-based text editor is micro, which combines the ease-of-use of nano with some of the power and flexibility of vim and emacs.

If you are using the Gnome desktop environment, the graphical text editor Gedit should be installed by default; several distributions also install Gedit by default, and it is easily available otherwise.

A fairly comprehensive list of text editors available on Linux can be found on the ArchWiki.

Chrome OS

Because of the nature of using the Linux container, the specifics for Chrome OS are the same as those for Linux.

The default text editor for Debian is nano, but vim should also be preinstalled. nano is among the easiest to use but the least "powerful" (it is fairly limited in its functionality). vim is difficult to learn but can be extremely useful once you are reasonably familiar with it. A more modern terminal-based text editor is micro, which combines the ease-of-use of nano with some of the power and flexibility of vim and emacs.

A common graphical text editor for Linux is Gedit, which should be easy to install.

A fairly comprehensive list of text editors available on Linux can be found on the ArchWiki.

Tips and Advice
  • If the number of options here is overwhelming, VS Code is a good place to start—it works well on all main operating systems, it has a robust plugin ecosystem, it is relatively easy to customize, and it is free and open source.
  • Terminal-based text editors, including nano, vim, and emacs, are typically installed by default on computers running Linux, so learning at least one of them will give you a familiar tool on nearly any computer you connect to for work.

Remote Access

Because so much of scientific computing involves running complex algorithms on large datasets, these processes are typically done on computing clusters hosted by universities or national labs. Therefore, if you anticipate needing to run code or work with data on anything other than your local machine, you will need a way of connecting to these remote computing clusters.

The remote connection protocol used in most computing is SSH (Secure SHell). The typical syntax for connecting to a computer with SSH is

ssh {user}@{host}.{domain}
macOS

macOS comes with an implementation of SSH.

Windows 10

Windows comes with an implementation of SSH. If you are using WSL, the Linux distribution should also come with SSH; if it does not, the OpenSSH implementation should be readily available through the distribution's package manager.

Linux

Most Linux distributions come with an implementation of SSH. If your system does not have it by default, it should be readily available through your distribution's package manger; the most common implementation on Linux is OpenSSH.

Chrome OS

The Debian Linux container should come with an implementation of SSH. If, for whatever reason, it is not installed by default, you can install OpenSSH with the following command:

apt install openssh-client
Tips and Advice
  • Note that, while you can view graphics and graphical programs over remote connections using the X Windows System (see below), this can be quite slow when your internet connection is low or unstable. In that case, you may need to be comfortable with working from within a terminal (e.g., using a command line text editor).

X Windows System

The X Windows System (X) is a ubiquitous framework for drawing graphics (including images and applications) and a standard protocol for interfacing with graphical output from remote computers.

While the general community is slowly moving towards Wayland (a modern replacement for X), scientific computing tends to adopt new technologies more slowly given the dependence on legacy software. As such, it is generally useful to have an installation of X (or some compatibility) on your local machine, especially if you are running older codes or connecting to remote machines.

macOS

The standard macOS implementation of X is XQuartz. It is generally recommended that you install XQuartz (along with other useful software) via a minimal installation of Xcode:

xcode-select --install

Windows 10

One of the most popular implementations of X for Windows is VcXsrv. Note that the VcXsrv application must be running in order to use the window forwarding features (i.e., so that you can view graphics from remote connections).

Linux

Most Linux distributions will use either an implementation of X (typically Xorg) or Wayland (with XWayland for compatibility) for graphics. As such, there is typically nothing more you need to do.

Chrome OS

As detailed in the Runtime features, the Linux container includes support for X programs by default.

Python

At this point, most daily scientific computing is done with Python. We will cover common solutions for environments and package management for Python which you are likely to see while doing scientific computing.

Note that, while your system may come with Python preinstalled (particularly if you are running macOS or Linux), we strongly advise that you use Conda and its Python environments rather than your system Python. This makes problems with conflicting packages (and possibly breaking aspects of your operating system) significantly less likely in the future.

Conda

Conda has become the de facto standard for managing environments as well as installing packages in Python for scientific computing. There are several reasons for this; the two most salient are (1) Conda can include non-Python software, which is frequently used in scientific Python packages (importantly, using Python's built-in venv and pip will not always be able to install every necessary dependency for a given scientific library), and (2) Conda environments can be easily shared with others (this is especially useful when trying to make sure that collaborators are using compatible versions of software).

There are several common distributions of Conda available, the two most common being Anaconda and Miniconda; the primary practical difference is that Anaconda comes with many common packages by default in its base environment, whereas Miniconda is a minimal installation. Note that you will want only one of Anaconda or Miniconda. Advanced users may be interested in using mamba, which can be readily installed through Conda.

Most software can simply be installed with the following syntax:

conda install {package}

Some software is unavailable through the default Conda channels. In this case, it may be available through conda-forge:

conda install -c conda-forge {package}

There are still some software which is unavailable through the Conda channels. Of course, pip is compatible with Conda; in fact, each Conda environment has its own "proper" pip to reduce the possibility of package conflicts. To install a package with pip, the syntax is

pip install {package}
macOS

The macOS Miniconda installers are available here. We recommend that you use the most recent Python version.

The Anaconda download is available here.

Windows 10

The Windows Miniconda installers are available here. If you are using WSL, you will want the Linux Miniconda installers here. We recommend that you use the most recent Python version.

The Anaconda download is available here. Note that, if you are using WSL, you may have to click "Get Additional Installers" for the Linux installer.

Linux

The Linux Miniconda installers are available here. We recommend that you use the most recent Python version.

The Anaconda download is available here.

Chrome OS

The Linux Miniconda installers are available here. We recommend that you use the most recent Python version.

The Anaconda download is available here.

Tips and Advice
  • If you enjoy maintaining the software on your computer (starting from scratch, updating, etc.) or you anticipate needing to use fairly different software across a number of projects, then Miniconda may be a better option for you. Otherwise, Anaconda is at the very least a good first place to start for many people.
  • It is usually recommended that you make a new Conda environment for each project you're working. Using environments helps reduce the possibility of package conflicts (e.g., where one package you'ure using requires a different version of some package that anaother package requires). Conda starts with a default (base) environment; it is generally recommended that you don't install new packages in the base environment but rather in specific project environments. Follow the documentation here for managing Conda environments.

Jupyter Notebooks

Jupyter notebooks are an increasingly popular method of writing and running code, especially within the scientific community. While they have their detractors, they appear to be here to stay and so it is important to at least be familiar with them and able to run them.

Anaconda comes with jupyter installed by default; if you are using Miniconda, then you will need to install jupyter:

conda install jupyter

A Jupyter notebook sever that runs in your browser may be initialized with

jupyter notebook
Tips and Advice
  • Using jupyer-widgets can add helpful functionality to your notebooks. For instance, one particularly useful widget is interact.
  • The "next-generation" version of Jupyter notebooks is JupyterLab.
  • If you want to work on Jupyter notebooks remotely, you can use SSH with port forwarding by following the guide here.

Git

Git is among the most ubiquitous tools for version control in computing in general. Version control allows changes in projects to be tracked in a robust way and enables collaborators to develop code independently and merge their changes when they are done. Many codes are publicly hosted on GitHub, GitLab, and/or Bitbucket, among others.

Alongside the Git-SCM documentation, GitHub provides a number of helpful tutorials for beginners; a few to take a look at first are git-handbook, flow, hello-world, and forking.

While the command line interface git is the most common way to see people working with Git (perhaps, aside from GitHub), a number of graphical interfaces also exist. These can be particularly helpful if you prefer visual representations of problems.

macOS

macOS comes with git preinstalled.

There are a number of graphical interfaces for Git available for macOS, including GitHub Desktop, Sublime Merge, and GitKraken. Note that graphical text editors often have plugins available for interfacing with Git and/or GitHub; in particular, Atom (Git, GitHub) and VS Code (Git, GitHub).

Windows 10

You can download Git for Windows here.

There are a number of graphical interfaces for Git available for Windows, including GitHub Desktop, Sublime Merge, and GitKraken. Note that graphical text editors often have plugins available for interfacing with Git and/or GitHub; in particular, Atom (Git, GitHub) and VS Code (Git, GitHub).

Linux

Most distributions of Linux come with git preinstalled (Git and the Linux kernel were both authored by Linus Torvalds). If your system does not have git, it should be readily available through your distribution's package manager.

There are a number of graphical interfaces for Git available for Linux, including Sublime Merge and GitKraken; see the ArchWiki for a fairly comprehensive list. Note that graphical text editors often have plugins available for interfacing with Git and/or GitHub; in particular, Atom (Git, GitHub) and VS Code (Git, GitHub).

Chrome OS

If your Debian Linux container does not have git preinstalled, you can install it with

apt install git

There are a number of graphical interfaces for Git available for Linux, including Sublime Merge and GitKraken; see the ArchWiki for a fairly comprehensive list; in principle, most of these should be compatible with the Debian Linux container. Note that graphical text editors often have plugins available for interfacing with Git and/or GitHub; in particular, Atom (Git, GitHub) and VS Code (Git, GitHub).

Tips and Advice
  • Don't forget to commit and pull (resolving any merge conflicts) before pushing!
  • It is often recommended to make a separate branch when developing code, while leaving the main (previously master) branch in a stable state.

GitHub

GitHub is a popular choice for hosting Git repositories and managing software. In addition to facilitating software development, GitHub provides a number of other services that have a lot of value—especially for students and early-career scientists.

GitHub Student Developer Pack

The GitHub Student Developer Pack provides a large number of benefits (i.e., free or discounted software and services). Perhaps most useful, the Student Developer Pack gives you free GitHub Pro while you're a student; the associated benefits tend to change over time,but it's usually always worth claiming this while you can.

GitHub Pages

GitHub Pages is a service for hosting websites from a GitHub repository. Many people choose to use their personal GitHub Pages website as a personal webpage; for example, you can see the source for this website here.

GitHub Actions

GitHub Actions is a service for automating various actions (e.g., building, testing, linting, etc.) on your code. Since it's introduction, many people have started using GitHub Actions to perform continuous integration for their code.

Tips and Advice

If you create a repository with the same name as your GitHub username, then the README.md file in that repository will appear on your profile page (e.g., see here).