Stand Back!: Building a scientific computing lab on public clouds with Python

A presentation at KCDC 2022 in in Kansas City, MO, USA by Laura Santamaria

As a citizen scientist or someone with a general curiosity for science, you might want to do some scientific computing. Maybe you want to explore more about weather prediction, and the datasets are more than your local machine can handle. Maintaining a lab for scientific computing projects can be challenging. Not only do you typically need significant amounts of processing power that’s typically fairly expensive, but you also need to ensure that the infrastructure is just as replicable and reproducible as your datasets to ensure reviewers (or your fellow explorers) can replicate your results. If you don’t have the fortune of someone before you setting up a lab that you can use, where do you start? Well, you can start with a free trial account on any cloud provider, and you can write your infrastructure in code to make it usable for the next person down the line.

In this workshop, we’ll use Python to build up a portable, personal scientific computing lab infrastructure on a public cloud that you can build up and tear down in a reliable, repeatable fashion. We’ll use that infrastructure to run some analysis with SciPy and pandas to try to understand a large-scale climatological pattern from public data, and we’ll learn a bit more about cloud computing along the way.

You’ll need Python installed on your machine; a free trial account on GCP and AWS; a free Pulumi account; and a willingness to learn a bit of science along the way.

Code

The following code examples from the presentation can be tried out live.