RSE Skills

(in Python)

Jack Atkinson

ICCS RSE Team
University of Cambridge

Marion Weinzierl

ICCS RSE Team
University of Cambridge

2024-07-11

Precursors

Slides and Materials

To access links or follow on your own device these slides can be found at:
jatkinson1000.github.io/rse-skills-workshop


All materials are available at:

Licensing

Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

Vectors and icons by SVG Repo used under CC0(1.0)

Precursors

  • Be nice (Python code of conduct)
  • Ask questions whenever they arise.
    • Someone else is probably wondering the same thing.
  • We will make mistakes.
    • Not all of them will be intentional.

Course structure

Today:

  • Tooling for better research software
    • venv for virtual environments
    • formatting and linting
    • structuring code

Tomorrow:

  • Writing better research software
    • comments
    • docstrings
    • repository documentation - licenses, README, Contributing.md
    • Other (naming, magic numbers,…)

What is Research Software?

Major Computational Programs

 

 

 

Data processing

 

 

Experiment support

 

 

 

Bathymetry by NOAA under public domain CTD Bottles by WHOI under public domain
Keeling Curve by Scripps under public domain
Dawn HPC by Joe Bishop with permission
Climate simulation by NSF under public domain

Why does this matter?

Why does this matter?

More widely than publishing papers, code is used in control and decision making:


  • Weather forecasting
  • Climate policy
  • Disease modelling (e.g. Covid)
  • Satellites and spacecraft1
  • Medical Equipment


Your code (or its derivatives) may well move from research to operational one day.

Margaret Hamilton and the Apollo XI by NASA under public domain

Why does this matter?1

def calc_p(n,t):
    return n*1.380649e-23*t
data = np.genfromtxt("mydata.csv")
p = calc_p(data[0,:],data[1,:]+273.15)
print(np.sum(p)/len(p))

What does this code do?

# Boltzmann Constant and 0 Kelvin
Kb = 1.380649e-23
T0 = 273.15

def calc_pres(n, t):
    """
    Calculate pressure using ideal gas law p = nkT

    Parameters:
        n : array of number densities of molecules [N m-3]
        t : array of temperatures in [K]
    Returns:
         array of pressures [Pa]
    """
    return n * Kb * t


# Read in data from file and convert T from [oC] to [K]
data = np.genfromtxt("mydata.csv")
n = data[0, :]
temp = data[1, :] + T0

# Calculate pressure, average, and print
pres = calc_pres(n, temp)
pres_av = np.sum(pres) / len(pres)
print(pres_av)

Virtual Environments

Virtual Environments


What?

  • A self-contained Python environment
  • Packages installed in a local folder
  • Advised to use on a per-project basis

Why?

  • Avoid system pollution through isolation
  • Allow different versions for different projects
  • Reproducibility - set versions

Virtual Environments - venv

Python has inbuilt support for creating isolated virtual environments through venv.


$ python3 -m venv rse-venv
$ source rse-venv/bin/activate
(rse-venv) $ pip install <packagename>
(rse-venv) $ deactivate
$
PS> python -m venv rse-venv
PS> rse-venv\Scripts\Activate.ps1
(rse-venv) PS> pip install <packagename>
(rse-venv) PS> deactivate
PS>
C:\> python -m venv rse-venv
C:\> rse-venv\Scripts\activate.bat
(rse-venv) C:\> pip install <packagename>
(rse-venv) C:\> deactivate
C:\>


You will see that a directory rse-venv/ has been created.
Pip will install dependencies into this directory.
To remove the venv we delete this directory with rm -r rse-venv on Unix, or rmdir /s rse-venv on Windows.

Other Languages

There are various other tools available to manage dependencies:

  • Python and more - conda
  • C, C++, Fortran:
    • Module environments
    • Spack
  • Rust - cargo
  • Julia - Pkg environments
  • R - renv

Exercise 1

Scenario: you have just finished some simulations with a climate model that should improve precipitation modelling and have the output data as a netCDF file.

You know that your colleague has produced relevant figures and analysis before, so ask them for a copy of their code (yay, reuse :+1:).

Go to exercise 1 and:

  • Examine the code in precipitation_climatology.py
  • Create and load a virtual environment
  • Install the necessary dependencies
  • Run the code - does it do what you thought?
  • Deactivate the environment

Basic packaging concepts

Code/software will often have several dependencies required to run.

Provide a record of these to users to save time and errors when installing.


Recorded in a requirements.txt file:

  • list required packages to be installed by pip
  • version constraints
  • typically specify top-level1

requirements.txt

netcdf4
xarray
scipy==1.13.1
numpy<2.0
cartopy


(rse-venv) $ pip install -r requirements.txt

Exercise 1 revisited

Scenario: you have just finished some simulations with a climate model that should improve precipitation modelling and have the output data as a netCDF file.

You know that your colleague has produced relevant figures and analysis before, so ask them for a copy of their code (yay, reuse :+1:).

Go to exercise 1 and:

  • Examine the code in precipitation_climatology.py
  • Create and load a virtual environment
  • Install the requirements from the supplied requirements.txt
  • Run the code - does it do what you thought?
  • Deactivate the environment

PEP8 and Formatting

Python PEPs

Python Enhancement Proposals

  • Technical documentation for the python community
  • Guidelines, standards, and best-practice

Relevant to us today are:

PEP8 & Formatting

“Readability counts”
    - Tim Peters in the Zen of Python

By ensuring code aligns with PEP8 we:

  • standardise style,
  • conform to best-practices, and
  • improve code readability to
  • make code easier to share, and
  • reduce misinterpretation.

“But I don’t have time to read and memorise all of this…”

PEP8 & Formatting - Ruff

Ruff (Astral 2025) - docs.astral.sh/ruff

  • a PEP 8 compliant formatter
    • Strict subset of PEP8
    • “Opinionated so you don’t have to be.”
  • For full details see style guide
  • Try online
(myvenv) $ pip install ruff
(myvenv) $ ruff format myfile.py
(myvenv) $ ruff format mydirectory/
(myvenv) PS> pip install ruff
(myvenv) PS> ruff format myfile.py
(myvenv) PS> ruff format mydirectory/

PEP8 & Formatting - Example

def long_func(x, param_one, param_two=[], param_three=24, param_four=None,
        param_five="Empty Report", param_six=123456):


    val = 12*16 +(24) -10*param_one +  param_six

    if x > 5:
        
        print("x is greater than 5")


    else:
        print("x is less than or equal to 5")


    if param_four:
        print(param_five)



    print('You have called long_func.')
    print("This function has several params.")

    param_2.append(x*val)
    return param_2
def long_func(
    x,
    param_one,
    param_two=[],
    param_three=24,
    param_four=None,
    param_five="Empty Report",
    param_six=123456,
):
    val = 12 * 16 + (24) - 10 * param_one + param_six

    if x > 5:
        print("x is greater than 5")

    else:
        print("x is less than or equal to 5")

    if param_four:
        print(param_five)

    print("You have called long_func.")
    print("This function has several params.")

    param_2.append(x * val)
    return param_2

PEP8 & Formatting - Ruff

  • Also runs on jupyter notebooks.

  • Highly configurable via configuration files.

  • I suggest incorporating into your projects now

    • Widely-used standard1
    • Plugins/lsp available for many editors.
    • Well suited to incorporation into continuous integration through workflows and git hooks.

Other languages

Similar formatting tools exist for other languages:

Exercise 2

Go to exercise 2 and:

  • install ruff
  • run ruff format on precipitation_climatology.py
  • examine the output
    • Is it more readable?
    • Is there any aspect of the formatting style you find unintuitive?
  • See exercises/02_formatting/README.md for more detailed instructions.

Naming For Clarity

It may seem inconsequential, but carefully naming variables and methods can improve the readability of code massively and can help to make code self-documenting.

A few naming tips and conventions:

  • The name should show the intention, think about how someone else might read it (this could be future you)
  • Use pronounceable names e.g.
  •  ms    --> mass
     chclt --> chocolate
     stm   --> stem
  • avoid abbreviations and single letter variable names where possible
  • Use names that can be searched
  • One word per concept e.g. choose one of put, insert, add in the same code base

Naming For Clarity

  • Plurals to indicate groups, e.g. a list of dog objects would be dogs, not dog_list
  • Describe content rather than storage type e.g.
  • array       --> dogs
    age_int     --> age
    country_set --> countries
  • Naming booleans, use prefixes like is, has or can and avoid negations like not_green e.g.
  • purple    --> is_purple
    not_plant --> is_plant
    sidekick  --> has_sidekick
  • Keep it simple and use technical terms where appropriate

Explaining Variables

Without explaining variable:


def calculate_fare(age):
    if (age < 14):
        return 3
        ...

With explaining variable:


def calculate_fare(age):
    is_child = age < 14
    if (is_child):
        return 3
    ...

Explaining Variables

Without an explaining variable, it is hard to see what this code is doing:

import re

re.search("^\\+?[1-9][0-9]{7,14}$", "Sophie: CV56 9PQ, +12223334444")

With explaining variables:

It is easier to see the intention. The code is more self-documenting.

import re

phone_number_regex = "^\\+?[1-9][0-9]{7,14}$"
re.search(phone_number_regex, "Sophie: CV56 9PQ, +12223334444")

Exercise 3

Look through the code for any names of methods or variables that could be improved or clarified and update them. Note if you are using an IDE like Intellij or VSCode, you can use automatic renaming. Can you find an example from each of the suggestions listed below? Does this make the code easier to follow?

Consider the following:

  • The name should show the intention, think about how someone else might read it (this could be future you)
  • Use pronounceable names e.g. mass not ms, stem not stm
  • avoid abbreviations and single letter variable names where possible
  • One word per concept e.g. choose one of put, insert, add in the same code base
  • Use names that can be searched
  • Describe content rather than storage type
  • Naming booleans, use prefixes like is, has or can and avoid negations like not_green
  • Plurals to indicate groups, e.g. a list of dog objects would be dogs, not dog_list
  • Keep it simple and use technical terms where appropriate
  • Use explaining variables

PEP8 & Beyond

Static Analysis

  • Check the code without running it
  • Catch issues before you run any code
  • Improve code quality1

There are various tools available:

  • ruff
  • Pylint
  • flake8
  • pycodestyle


We will be using ruff check for static analysis, which we already installed as part of ruff in a previous exercise.

Code Quality - ruff check

def long_func(
    x,
    param_one,
    param_two=[],
    param_three=24,
    param_four=None,
    param_five="Empty Report",
    param_six=123456,
):
    val = 12 * 16 + (24) - 10 * param_one + param_six

    if x > 5:
        print("x is greater than 5")

    else:
        print("x is less than or equal to 5")

    if param_four:
        print(param_five)

    print("You have called long_func.")
    print("This function has several params.")

    param_2.append(x * val)
    return param_2

Code Quality - ruff check

def long_func(
    x,
    param_one,
    param_two=[],
    param_three=24,
    param_four=None,
    param_five="Empty Report",
    param_six=123456,
):
    val = 12 * 16 + (24) - 10 * param_one + param_six

    if x > 5:
        print("x is greater than 5")

    else:
        print("x is less than or equal to 5")

    if param_four:
        print(param_five)

    print("You have called long_func.")
    print("This function has several params.")

    param_2.append(x * val)
    return param_2
(myvenv) $ ruff check long_func.py
long_func.py:1:1: D100 Missing docstring in public module
long_func.py:1:5: PLR0913 Too many arguments in function definition (7 > 5)
long_func.py:1:5: D103 Missing docstring in public function
long_func.py:4:5: ARG001 Unused function argument: `param_two`
long_func.py:4:15: B006 Do not use mutable data structures for argument defaults
long_func.py:5:5: ARG001 Unused function argument: `param_three`
long_func.py:12:12: PLR2004 Magic value used in comparison, consider replacing `5` with a constant variable
long_func.py:24:5: F821 Undefined name `param_2`
long_func.py:25:12: F821 Undefined name `param_2`
Found 9 errors.

(myvenv) $

Note: use the
--output-format=concise
flag for this shortened output.

Code Quality - ruff check

def long_func(
    x,
    param_one,
    param_two=[],
    param_four=None,
    param_five="Empty Report",
    param_six=123456,
):
    val = 12 * 16 + (24) - 10 * param_one + param_six

    if x > 5:
        print("x is greater than 5")

    else:
        print("x is less than or equal to 5")

    if param_four:
        print(param_five)

    print("You have called long_func.")
    print("This function has several params.")

    param_two.append(x * val)
    return param_two
(myvenv) $ ruff check long_func.py
long_func.py:1:1: D100 Missing docstring in public module
long_func.py:1:5: PLR0913 Too many arguments in function definition (6 > 5)
long_func.py:1:5: D103 Missing docstring in public function
long_func.py:4:15: B006 Do not use mutable data structures for argument defaults
long_func.py:11:12: PLR2004 Magic value used in comparison, consider replacing `5` with a constant variable
Found 5 errors.

(myvenv) $


Use ruff rule to understand different rules:

ruff rule B006

will display the docs for B006

IDE Integration

  • Catch issues before running ruff
  • Gradually coerces you to become a better programmer
  • See ruff editor integration docs for instructions on setup for:
    • Vim
    • pycharm
    • Sublime
    • VS Code
    • Emacs

Other languages

Similar tools for linting and static analysis exist for other languages:

More generally see this list of static analysis tools.

Configuration

The default set of linting rules for ruff is quite simple.

The ruleset to be applied can be configured in a ruff.toml file, as we do in this project, or pyproject.toml for packaged code. For full details see the ruff configuration documentation.

Details of the different rules and rulesets that can be selected can be found in the ruff rules documentation.

Exercise 4

Go to exercise 4 and:

  • run ruff check on precipitation_climatology.py
  • examine the report and try and address some of the issues.
    • Try and deal with: F401 unused imports, I001 unsorted imports, B006 dangerous default, and D202 Blank lines
    • If you feel like it you could try and fix: B904 try exceptions
    • Ignore D100/D103 missing docstrings and PLR2004 magic values for now - we’ll come to them later.
    • Unless you are really keen don’t worry about: PLR0913 Too many arguments

Extensions:

  • try and add linting to your preferred text editor or IDE
  • explore the configuration options for ruff
  • explore the option to supress ruff warnings
  • explore autofixes using the --fix flag
  • explore rules in development using the --preview flag

That’s it for today!



See you all tomorrow afternoon!

Welcome back!



Let’s dive straight into comments and docstrings!

Structuring your code

Functions

  • Avoid code duplication
    • Bad style
    • You will forget to update all the copies at one point when you make changes!
  • Readability
    • A clearly named function replaces a chunk of code.
  • Functions as building blocks
    • Create generic interface with several exchangeable options (e.g. “possion_solver”, “gauss_seidel_solver”, etc.)
    • Separation of concerns/single responsibility principles

Functions for readability and maintainability

"""Module implementing pendulum equations."""
import numpy as np

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * 9.81 *  l * np.cos(theta))

def energy(m, l, theta):
    """..."""
    return m * 9.81 * l * np.cos(theta)

def check_small_angle(theta):
    """..."""
    if theta <= np.pi / 1800.0:
        return True
    return False

def bpm(l):
    """..."""
    return 60.0 / 2.0 * np.pi * np.sqrt(l / 9.81)


"""Module implementing pendulum equations."""
import numpy as np

GRAV = 9.81

def get_period(l):
    """..."""
    return 2.0 * np.pi * np.sqrt(l / GRAV)

def max_height(l, theta):
    """..."""
    return l * np.cos(theta)

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * GRAV * max_height(l, theta))

def energy(m, l, theta):
    """..."""
    return m * GRAV * max_height(l, theta)

def check_small_angle(theta, small_ang=np.pi/1800.0):
    """..."""
    if theta <= small_ang:
        return True
    return False

def bpm(l):
    """..."""
    # Divide 60 seconds by period [s] for beats per minute
    return 60.0 / get_period(l)

Further structuring

  • Breaking code into modules and files instead of having everything in one file
  • Improves readability and manageability (-> scalability, extendability, coupling, testing)

Exercise XX

  • Go to Exercise XX and try to think of how you can structure the code in the plotting.py file to avoid code duplication, and improve readability and reusability.
  • What enables the improved code you to do?
  • Now look at plotting_solution.py . Can you further adapt the code to allow, for example, for customised axis labels?

Writing better (Python) code

f-strings

A better way to format strings since Python 3.6
Not catching on because of self-teaching from old code.

Strings are prepended with an f allowing variables to be used in-place:

name = "electron"
mass = 9.1093837015E-31

# modulo
print("The mass of an %s is %.3e kg." % (name, mass))

# format
print("The mass of an {} is {:.3e} kg.".format(name, mass))

# f-string
print(f"The mass of an {name} is {mass:.3e} kg.")

f-strings can take expressions:

print(f"a={a} and b={b}. Their product is {a * b}, sum is {a + b}, and a/b is {a / b}.")

See Real Python for more information. Note: pylint W1203 recommends against using f-strings in logging calls.

Remove Magic Numbers

Numbers in code that are not immediately obvious.

  • Hard to read
  • Hard to maintain
  • Hard to adapt

Instead:

  • Name a variable conveying meaning
  • Set to a constant
  • Use a comment to explain

numberwang by Mitchell and Webb under fair use

Remove Magic Numbers

"""Module implementing pendulum equations."""
import numpy as np

def get_period(l):
    """..."""
    return 2.0 * np.pi * np.sqrt(l / 9.81)

def max_height(l, theta):
    """..."""
    return l * np.cos(theta)

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * 9.81 * max_height(l, theta))

def energy(m, l, theta):
    """..."""
    return m * 9.81 * max_height(l, theta)

def check_small_angle(theta):
    """..."""
    if theta <= np.pi / 1800.0:
        return True
    return False

def bpm(l):
    """..."""
    return 60.0 / get_period(l)


"""Module implementing pendulum equations."""
import numpy as np

GRAV = 9.81

def get_period(l):
    """..."""
    return 2.0 * np.pi * np.sqrt(l / GRAV)

def max_height(l, theta):
    """..."""
    return l * np.cos(theta)

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * GRAV * max_height(l, theta))

def energy(m, l, theta):
    """..."""
    return m * GRAV * max_height(l, theta)

def check_small_angle(theta, small_ang=np.pi/1800.0):
    """..."""
    if theta <= small_ang:
        return True
    return False

def bpm(l):
    """..."""
    # Divide 60 seconds by period [s] for beats per minute
    return 60.0 / get_period(l)

Put config in a config file

  • Ideally we shouldn’t have hop in and out of the code (and recompile in higher level langs) every time we change a runtime setting
  • No easy record of runs

Instead:

  • It’s easy to read a json file into python as a dictionary Handle as you wish - create a class, read to variables etc.
  • Could even make config filename a command line argument
{
  "config_name": "June 2022 m01 n19 run",
  "start_date": "2022-05-28 00:00:00",
  "end_date": "2022-06-12 23:59:59",
  "satellites": ["m01", "n19"],
  "noise_floor": [3.0, 3.0, 3.0],
  "check_SNR": true,
  "L_lim": [1.5, 8.0],
  "telescopes": [90],
  "n_bins": 27
}
import json


with open('config.json') as json_file:
    config = json.load(json_file)

print(config)
{'config_name': 'June 2022 m01 n19 run', 'start_date': '2022-05-28 00:00:00', 'end_date': '2022-06-12 23:59:59', 'satellites': ['m01', 'n19'], 'noise_floor': [3.0, 3.0, 3.0], 'check_SNR': True, 'L_lim': [1.5, 8.0], 'telescopes': [90], 'n_bins': 27}

Exercise 6

Magic Numbers

  • Look through the code and identify any magic numbers.
  • Implement what you feel is the best approach in each case

f-strings

  • Look for any string handling (currently using the .format() approach) and update it to use f-strings.
    • Is the intent clearer?
    • Is the layout of the data written to file easier to understand?

Configuration settings

  • There is helpfully a list of configurable inputs at the end of the file under "__main__".
    We can improve on this, however, by placing them in a configuration file.
  • Create an appropriate json file to be read in as a dictionary and passed to the main function.

READMEs, Licenses, and other files

READMEs

  • First point of contact for a new user/contributor with the code repository
  • Should give essential information on
    • what this software is
    • what it’s for
    • how to get started
  • In the best case it also tells you
    • who built/is building this
    • how to contribute
    • how to reuse
  • Usually written in Markdown as README.md
  • See makeareadme.com and readme.so for detailed information, examples, and tools

License

All public codes should have a license attached!

  • LICENSE file in the main directory
  • Protect ownership
  • Limit liability
  • Clarify what can be done with the code

The right selection may depend on your organisation and/or funder.

See choosealicense.com for more information.

GitHub and GitLab contain helpers to create popular licenses.

Types of Licenses

  • Public Domain, Permissive, Copyleft

License guide by TechTarget under fair use

How to choose a license

  • https://choosealicense.com/licenses/
  • Permissive licenses:
    • Apache License 2.0
    • MIT License
  • Copyleft:
    • Means that copy/adaption has to use the same license
    • GNU General Public License v3.0

GPL3 image is in the Public Domain
MIT logo is in the Public Domain
Apache License image by Apache Software Foundation under Apache License 2.0

Example: MIT License

Copyright <YEAR> <COPYRIGHT HOLDER>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Add a license in Github

Add a license in Gitlab

Other potential files in your repository

  • CONTRIBUTING.md
  • CITATION.cff
  • CODE_OF_CONDUCT.md
  • CHANGES.md

Writing better (Python) code

f-strings

A better way to format strings since Python 3.6
Not catching on because of self-teaching from old code.

Strings are prepended with an f allowing variables to be used in-place:

name = "electron"
mass = 9.1093837015E-31

# modulo
print("The mass of an %s is %.3e kg." % (name, mass))

# format
print("The mass of an {} is {:.3e} kg.".format(name, mass))

# f-string
print(f"The mass of an {name} is {mass:.3e} kg.")

f-strings can take expressions:

print(f"a={a} and b={b}. Their product is {a * b}, sum is {a + b}, and a/b is {a / b}.")

See Real Python for more information. Note: pylint W1203 recommends against using f-strings in logging calls.

Remove Magic Numbers

Numbers in code that are not immediately obvious.

  • Hard to read
  • Hard to maintain
  • Hard to adapt

Instead:

  • Name a variable conveying meaning
  • Set to a constant
  • Use a comment to explain

numberwang by Mitchell and Webb under fair use

Remove Magic Numbers

"""Module implementing pendulum equations."""
import numpy as np

def get_period(l):
    """..."""
    return 2.0 * np.pi * np.sqrt(l / 9.81)

def max_height(l, theta):
    """..."""
    return l * np.cos(theta)

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * 9.81 * max_height(l, theta))

def energy(m, l, theta):
    """..."""
    return m * 9.81 * max_height(l, theta)

def check_small_angle(theta):
    """..."""
    if theta <= np.pi / 1800.0:
        return True
    return False

def bpm(l):
    """..."""
    return 60.0 / get_period(l)


"""Module implementing pendulum equations."""
import numpy as np

GRAV = 9.81

def get_period(l):
    """..."""
    return 2.0 * np.pi * np.sqrt(l / GRAV)

def max_height(l, theta):
    """..."""
    return l * np.cos(theta)

def max_speed(l, theta):
    """..."""
    return np.sqrt(2.0 * GRAV * max_height(l, theta))

def energy(m, l, theta):
    """..."""
    return m * GRAV * max_height(l, theta)

def check_small_angle(theta, small_ang=np.pi/1800.0):
    """..."""
    if theta <= small_ang:
        return True
    return False

def bpm(l):
    """..."""
    # Divide 60 seconds by period [s] for beats per minute
    return 60.0 / get_period(l)

Put config in a config file

  • Ideally we shouldn’t have hop in and out of the code (and recompile in higher level langs) every time we change a runtime setting
  • No easy record of runs

Instead:

  • It’s easy to read a json file into python as a dictionary Handle as you wish - create a class, read to variables etc.
  • Could even make config filename a command line argument
{
  "config_name": "June 2022 m01 n19 run",
  "start_date": "2022-05-28 00:00:00",
  "end_date": "2022-06-12 23:59:59",
  "satellites": ["m01", "n19"],
  "noise_floor": [3.0, 3.0, 3.0],
  "check_SNR": true,
  "L_lim": [1.5, 8.0],
  "telescopes": [90],
  "n_bins": 27
}
import json


with open('config.json') as json_file:
    config = json.load(json_file)

print(config)
{'config_name': 'June 2022 m01 n19 run', 'start_date': '2022-05-28 00:00:00', 'end_date': '2022-06-12 23:59:59', 'satellites': ['m01', 'n19'], 'noise_floor': [3.0, 3.0, 3.0], 'check_SNR': True, 'L_lim': [1.5, 8.0], 'telescopes': [90], 'n_bins': 27}

Exercise 6

Magic Numbers

  • Look through the code and identify any magic numbers.
  • Implement what you feel is the best approach in each case

f-strings

  • Look for any string handling (currently using the .format() approach) and update it to use f-strings.
    • Is the intent clearer?
    • Is the layout of the data written to file easier to understand?

Configuration settings

  • There is helpfully a list of configurable inputs at the end of the file under "__main__".
    We can improve on this, however, by placing them in a configuration file.
  • Create an appropriate json file to be read in as a dictionary and passed to the main function.

Other things

Beyond the scope of today are a few other honourable mentions:

  • Functions and modules
  • Packaging
    • Breaking projects into modules and __init__.py
    • Distributing projects with pyproject.toml
  • Documentation
    • Auto-generation from docstrings with sphinx or mkdocs
  • Type hinting
    • Adding type hinting to python code - how and why?
    • Type checking with mypy

These lessons are beyond the scope of today.

Closing

Where can I get help?

The ICCS RSE team are always keen to support researchers with developing and applying the principles discussed today.

If you would like to discuss applying this to your own codebase consider signing up for an ICCS Climate Code Clinic:

  • 1hr slot
  • RSEs will review code in advance and provide feedback and guidance.
  • Online booking form

Where can I learn more?


Get in touch:

References

The code in this workshop is based on a script from (Irving 2019).

Astral. 2025. Ruff: An extremely fast Python linter and code formatter, written in Rust. https://github.com/astral-sh/ruff. (https://docs.astral.sh/ruff/.
Cannon, B, D Ingram, P Ganssle, P Gedam, S Eustace, T Kluyver, and T Chung. 2020. PEP 621 – Storing project metadata in pyproject.toml.” https://peps.python.org/pep-0621/.
Goodger, D, and G van Rossum. 2001. PEP 257 – Docstring Conventions.” https://peps.python.org/pep-0257/.
Irving, Damien. 2019. “Python for Atmosphere and Ocean Scientists.” Journal of Open Source Education 2 (16): 37. https://doi.org/10.21105/jose.00037.
Murphy, N. 2023. “Writing Clean Scientific Software.” In. Presented at the HPC Best Practices Webinar Series. https://www.youtube.com/watch?v=Q6Ksu_uX3bc.
Rossum, G van, B Warsaw, and A Coghlan. 2001, 2013. PEP8 – Style Guide for Python Code.” https://peps.python.org/pep-0008/.