(in Python)
2025-07-01
To access links or follow on your own device these slides can be found at:
jatkinson1000.github.io/rse-skills-workshop
All materials are available at:
Except where otherwise noted, these presentation materials are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.
Today:
Tomorrow:
Major Computational Programs
Data processing
Experiment support
Bathymetry by NOAA under public domain CTD Bottles by WHOI under public domain
Keeling Curve by Scripps under public domain
Dawn HPC by Joe Bishop with permission
Climate simulation by NSF under public domain
More widely than publishing papers, code is used in control and decision making:
Your code (or its derivatives) may well move from research to operational one day.
Margaret Hamilton and the Apollo XI by NASA under public domain
def calc_p(n,t):
return n*1.380649e-23*t
data = np.genfromtxt("mydata.csv")
p = calc_p(data[0,:],data[1,:]+273.15)
print(np.sum(p)/len(p))
What does this code do?
# Boltzmann Constant and 0 Kelvin
Kb = 1.380649e-23
T0 = 273.15
def calc_pres(n, t):
"""
Calculate pressure using ideal gas law p = nkT
Parameters:
n : array of number densities of molecules [N m-3]
t : array of temperatures in [K]
Returns:
array of pressures [Pa]
"""
return n * Kb * t
# Read in data from file and convert T from [oC] to [K]
data = np.genfromtxt("mydata.csv")
n = data[0, :]
temp = data[1, :] + T0
# Calculate pressure, average, and print
pres = calc_pres(n, temp)
pres_av = np.sum(pres) / len(pres)
print(pres_av)
venv
Python has inbuilt support for creating isolated virtual environments through venv
.
You will see that a directory rse-venv/
has been created.
Pip will install dependencies into this directory.
To remove the venv we delete this directory with rm -r rse-venv
on Unix, or rmdir /s rse-venv
on Windows.
There are various other tools available to manage dependencies:
Scenario: you have just finished some simulations with a climate model that should improve precipitation modelling and have the output data as a netCDF file.
You know that your colleague has produced relevant figures and analysis before, so ask them for a copy of their code (yay, reuse :+1:).
Go to exercise 1 and:
precipitation_climatology.py
Code/software will often have several dependencies required to run.
Provide a record of these to users to save time and errors when installing.
Recorded in a requirements.txt
file:
Scenario: you have just finished some simulations with a climate model that should improve precipitation modelling and have the output data as a netCDF file.
You know that your colleague has produced relevant figures and analysis before, so ask them for a copy of their code (yay, reuse :+1:).
Go to exercise 1 and:
precipitation_climatology.py
requirements.txt
Relevant to us today are:
“Readability counts”
- Tim Peters in the Zen of Python
By ensuring code aligns with PEP8 we:
“But I don’t have time to read and memorise all of this…”
Ruff (Astral 2025) - docs.astral.sh/ruff
def long_func(x, param_one, param_two=[], param_three=24, param_four=None,
param_five="Empty Report", param_six=123456):
val = 12*16 +(24) -10*param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print('You have called long_func.')
print("This function has several params.")
param_2.append(x*val)
return param_2
def long_func(
x,
param_one,
param_two=[],
param_three=24,
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_2.append(x * val)
return param_2
Also runs on jupyter notebooks.
Highly configurable via configuration files.
I suggest incorporating into your projects now
Similar formatting tools exist for other languages:
Go to exercise 2 and:
ruff format
on precipitation_climatology.py
exercises/02_formatting/README.md
for more detailed instructions.Beyond PEP8:
Save time and resource:
There are various tools available:
We will be using ruff check
for static analysis, which we already installed as part of ruff
in a previous exercise.
ruff check
def long_func(
x,
param_one,
param_two=[],
param_three=24,
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_2.append(x * val)
return param_2
ruff check
def long_func(
x,
param_one,
param_two=[],
param_three=24,
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_2.append(x * val)
return param_2
(rse-venv) $ ruff check long_func.py
long_func.py:4:5: ARG001 Unused function argument: `param_two`
long_func.py:4:15: B006 Do not use mutable data structures for argument defaults
long_func.py:5:5: ARG001 Unused function argument: `param_three`
long_func.py:24:5: F821 Undefined name `param_2`
long_func.py:25:12: F821 Undefined name `param_2`
Found 5 errors.
(rse-venv) $
Note:
use the --output-format=concise
flag for this shortened output.
ruff check
def long_func(
x,
param_one,
param_two=[],
param_four=None,
param_five="Empty Report",
param_six=123456,
):
val = 12 * 16 + (24) - 10 * param_one + param_six
if x > 5:
print("x is greater than 5")
else:
print("x is less than or equal to 5")
if param_four:
print(param_five)
print("You have called long_func.")
print("This function has several params.")
param_two.append(x * val)
return param_two
(rse-venv) $ ruff check long_func.py
long_func.py:4:15: B006 Do not use mutable data structures for argument defaults
Found 1 errors.
(rse-venv) $
Use ruff rule
to understand different rules:
ruff rule B006
will display the docs for B006
Similar tools for linting and static analysis exist for other languages:
More generally see this list of static analysis tools.
Go to exercise 3 and:
ruff check
on precipitation_climatology.py
F401
unused imports, I001
unsorted imports, B006
dangerous default, and D202
Blank linesB904
try exceptionsExtensions:
--fix
flag--preview
flagThe default set of linting rules for ruff is quite simple.
The ruleset to be applied can be configured in a ruff.toml
file, as we do in this project, or pyproject.toml
for packaged code.
For full details see the ruff configuration documentation.
Details of the different rules and rulesets that can be selected can be found in the ruff rules documentation.
Let’s take a look and make some changes in preparation for the next sections:
"D"
(pydocstyle) ruleset."PLR2004"
(magic number comparisons) and"E501"
(line length)."""Module implementing pendulum equations."""
import numpy as np
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * 9.81 * l * np.cos(theta))
def energy(m, l, theta):
"""..."""
return m * 9.81 * l * np.cos(theta)
def check_small_angle(theta):
"""..."""
if theta <= np.pi / 1800.0:
return True
return False
def bpm(l):
"""..."""
return 60.0 / 2.0 * np.pi * np.sqrt(l / 9.81)
"""Module implementing pendulum equations."""
import numpy as np
GRAV = 9.81
def get_period(l):
"""..."""
return 2.0 * np.pi * np.sqrt(l / GRAV)
def max_height(l, theta):
"""..."""
return l * np.cos(theta)
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * GRAV * max_height(l, theta))
def energy(m, l, theta):
"""..."""
return m * GRAV * max_height(l, theta)
def check_small_angle(theta, small_ang=np.pi/1800.0):
"""..."""
if theta <= small_ang:
return True
return False
def bpm(l):
"""..."""
# Divide 60 seconds by period [s] for beats per minute
return 60.0 / get_period(l)
See you all tomorrow afternoon!
Let’s dive straight into naming, comments, and docstrings!
It may seem inconsequential, but carefully naming variables and methods can greatly improve the readability of code.
Since “code is read more than it is run” this is important for future you, but also for anyone you collaborate with or who might use your code in future.
It helps to make code self-documenting, reducing future bugs due to misunderstandings.
Here we cover some key considerations when writing code.
Show the intention – how will someone else (future you) read it?
Use readable, pronounceable, memorable, and searchable names:
ms --> mass
chclt --> chocolate
stm --> stem
avoid abbreviations and single letters unless commonly used
Employ concept consistency
e.g. only one of get_
, retrive_
, fetch_
in the code base
Describe content rather than storage type
Use plurals to indicate groups
Name booleans using prefixes like is_
, has_
, can_
and avoid negations like not_
:
array --> dogs float_or_int --> returns_int
age_int --> age not_plant --> is_plant
country_set --> countries sidekick --> has_sidekick
Without explaining variable:
With explaining variable:
Without an explaining variable, it is hard to see what this code is doing:
Numbers in code that are not immediately obvious.
Instead:
numberwang by Mitchell and Webb under fair use
"""Module implementing pendulum equations."""
import numpy as np
def get_period(l):
"""..."""
return 2.0 * np.pi * np.sqrt(l / 9.81)
def max_height(l, theta):
"""..."""
return l * np.cos(theta)
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * 9.81 * max_height(l, theta))
def energy(m, l, theta):
"""..."""
return m * 9.81 * max_height(l, theta)
def check_small_angle(theta):
"""..."""
if theta <= np.pi / 1800.0:
return True
return False
def beats_per_minute(l):
"""..."""
return 60.0 / get_period(l)
"""Module implementing pendulum equations."""
import numpy as np
GRAV = 9.81
def get_period(l):
"""..."""
return 2.0 * np.pi * np.sqrt(l / GRAV)
def max_height(l, theta):
"""..."""
return l * np.cos(theta)
def max_speed(l, theta):
"""..."""
return np.sqrt(2.0 * GRAV * max_height(l, theta))
def energy(m, l, theta):
"""..."""
return m * GRAV * max_height(l, theta)
def check_small_angle(theta, small_ang=np.pi/1800.0):
"""..."""
if theta <= small_ang:
return True
return False
def beats_per_minute(l):
"""..."""
# Divide 60 seconds by period [s] for beats per minute
return 60.0 / get_period(l)
Look through the code for method or variable names that could be improved or clarified and update them.1
Look through the code and identify any magic numbers.
Implement what you feel to be the best approach in each case.
Does this make the code easier to follow?
Consider the following, can you find an example of each:
Comments are tricky, and very much to taste.
Some thoughts:1
“Programs must be written for people to read and […] machines to execute.”
- Hal Abelson
“A bad comment is worse than no comment at all.”
“A comment is a lie waiting to happen.”
=> Comments have to be maintained, just like the code, and there is no way to check them!
Cat code comment image by 35_equal_W
Dead code e.g.
Variable definitions e.g.
Redundant comments e.g. i += 1 # Increment i
These are what make your code reusable (by you and others).
"""..."""
.Various formatting options exist: numpy, Google, reST, etc.
We will follow numpydoc as it is readable and widely used in scientific code.
Full guidance for numpydoc is available.
Key components:
Parameters
).Returns
).Consider also:
Key components:
Parameters
).Returns
).def calculate_gyroradius(mass, v_perp, charge, B, gamma=None):
"""
Calculates the gyroradius of a charged particle in a magnetic field
Parameters
----------
mass : float
The mass of the particle [kg]
v_perp : float
velocity perpendicular to magnetic field [m/s]
charge : float
particle charge [coulombs]
B : float
Magnetic field strength [teslas]
gamma : float, optional
Lorentz factor for relativistic case. default=None for non-relativistic case.
Returns
-------
r_g : float
Gyroradius of particle [m]
Notes
-----
.. [1] Walt, M, "Introduction to Geomagnetically Trapped Radiation,"
Cambridge Atmospheric and Space Science Series, equation (2.4), 2005.
"""
r_g = mass * v_perp / (abs(charge) * B)
if gamma:
r_g = r_g * gamma
return r_g
The "D": pydocstyle
ruleset in ruff
provides us with a tool for checking the quality of our docstrings.
We enabled this in our ruff configuation at the end of exercise 3, and now we can investigate the warnings further.
(rse-venv) $ ruff check gyroradius.py
gyroradius.py:3:5:
D417 Missing argument description in the docstring for `calculate_gyroradius`: `B`
gyroradius.py:4:5 in public function `calculate_gyroradius`:
D202: No blank lines allowed after function docstring
gyroradius.py:4:5 in public function `calculate_gyroradius`:
D400: First line should end with a period
gyroradius.py:4:5 in public function `calculate_gyroradius`:
D401: First line should be in imperative mood
(rse-venv) $
Note: with "D417"
enabled we can also catch missing variables in numpy docstrings!
def calculate_gyroradius(mass, v_perp, charge, B, gamma=None):
"""
Calculates the gyroradius of a charged particle in a magnetic field
Parameters
----------
mass : float
The mass of the particle [kg]
v_perp : float
velocity perpendicular to magnetic field [m/s]
charge : float
particle charge [coulombs]
gamma : float, optional
Lorentz factor for relativistic case. default=None for non-relativistic case.
Returns
-------
r_g : float
Gyroradius of particle [m]
Notes
-----
.. [1] Walt, M, "Introduction to Geomagnetically Trapped Radiation,"
Cambridge Atmospheric and Space Science Series, equation (2.4), 2005.
"""
r_g = mass * v_perp / (abs(charge) * B)
if gamma:
r_g = r_g * gamma
return r_g
subroutine add(a, b, sum)
!! Add two integers.
integer, intent(in) :: a !! First number, a
integer, intent(in) :: b !! Second number, b
integer, intent(out) :: sum !! Sum of a and b
sum = a + b
end subroutine add
Julia - triple quoted docstrings
Go to exercise 6 and examine the comments:
Now turn your attention to the docstrings:
README.md
All public codes should have a license attached!
LICENSE
file in the main directoryThe right selection may depend on your organisation and/or funder.
See choosealicense.com for more information.
GitHub and GitLab contain helpers to create popular licenses.
License guide by TechTarget under fair use
GPL3 image is in the Public Domain
MIT logo is in the Public Domain
Apache License image by Apache Software Foundation under Apache License 2.0
Copyright <YEAR> <COPYRIGHT HOLDER>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Beyond the scope of today are a few other honourable mentions:
__init__.py
pyproject.toml
These lessons are beyond the scope of today.
The ICCS RSE team are always keen to support researchers with developing and applying the principles discussed today.
If you would like to discuss applying this to your own codebase consider signing up for an ICCS Climate Code Clinic:
Get in touch:
The code in this workshop is based on a script from (Irving 2019).
Comments and Docstrings