Research tools

In this section are listed the different tools used at RSES for handling and treating data. We further advise taking a tour of the [iEarth website]( for further details on the tools developped at RSES.

Data confinement

Several strategies are used for handling and storing data with a long-term capacity, a critical parameter in today's world where the amount of data is growing exponentially. Text files are often a first place to start, but several formats exist and may be useful to handle the data and store matainformation, critical for long-term storage and ensuring the usefullness of data in the future.


Speadsheet softwares allow analysing, organising and storing data in tabular form. The first name for spreadsheet softwares in the mind of many people is [Excel]( A great open source and free alternative is provided by [LibreOffice]( Numerical-focused programming languages can interact with spreadsheets via built-in or additional libraries.


SQL is a domain-specific language that allows you to store datsa in a relational database. You can launch queries to add, delete, update, look at specific data. Very powerful for data organised as spreadsheets where queries (missing in Excel and Libreoffice) are needed. For a free version, [SQLite]( is a must (particularly its [Firefox manager](, and can easily interact with the free languages R, Python or Julia.


Hierarchical Data Format, a format designed to store and organize large amounts of data. Usual numerical-focused programming languages offer plenty of options for saving and loading HDF5 data.

Programming languages

Four high-level programming languages focused on numerical analysis are currently in use at RSES: Matlab, Python, R, and Julia.


Matlab is available for any member of the ANU; please consult the page for information. Matlab is the go-to language in many domains, but it is also very expensive and its data analytics / machine learning libraries may be a bit outdated compared to the rapidly evolving open source libraries in other languages that are directly supported by data scientists. Free alternatives are Scilab and Octave.


Python is a general high-level programming language, offering excellent numerical capabilities through its scientific libraries, the big three being Scipy, Numpy and Matplotlib. We recommand installing it using the Anaconda installer, as it makes its installation and maintenance very easy. Python is sometimes considered as slower than other language (this is not always true...), but it's syntax is very elegant, and really good for first-time programmers.


Julia is a relatively new but growing language. It is a high level language aimed at numerical computations, simple as Matlab, fast as C or Fortran, expressive as Python, and with a central repository system as R. Julia is quite new so not everything is available in the libraries, but it is a fast-growing language very pleasant to use. The advantage of Julia is that Pythion libraries can be directly called inside Julia code, as C or Fortran functions (a no wrapper policy is in place). Julia thus solves the famous "two languages" problem.


R is an open-source version of the S language, developped in the 1980's in the Bell lab and aimed at statistical computing. Many different libraries are available through the CRAN repositories. As Python, R may not be the fastest language for heavy numerical computations, but benefits of many data analystic and statistic tools.

Low level programming languages

Fortran and C are also used at RSES for specific applications, notably in Geophysics. Those languages can offer very high computational speed for specific applications. However, code prototyping and routine data analysis are not easy with such languages. For such tasks, high-level languages with a numerical analysis focus (as listed above) are used and recommanded.

Virtual environments

Local virtual machines

Virtual systems are used to create virtual computing systems, allowing for instance to use Linux on a Windows or Mac system. This can allow, for example, to use Python or Julia in Linux for enjoying libraries that are not "Windows ready". Several options are available, commercial or free open-source. For starting, we recommand using the free Virtualbox from Oracle.


Containers are lightweight virtual environments designed to allow one to easily distribute and run a piece of software on any machine, regarless of its architecture. Such approach can be particularly sucessful art providing ready-to-go environments for new users. More information can be found on the website of the famous container provider Docker.


We use and also develop specific libraries at RSES, aimed for treatment of geoscience data in various domains, as listed below.

General libraries


Python library with a lot of different algorithms for data analysis and machine-learning treatment. Used in Spectra.jl (see below), for instance.


Orange is an open source visual programming software for data mining and visualization.


Shogun is an open source C++ library for machine learning.

Dlib C++

Dlib C++ is an open source C++ library for machine learning.


mlpack is an open source C++ library for machine learning.


TensorFlow is an open source software library developped by Google for machine learning.


Theano is an open source Python library developped by the Lisa Lab in Montreal for machine learning, and particularly aimed at deep learning.


Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first.

Optimisation - Traditional

Scipy Optimise

The Scipy Python library contains various optimisation algorithms for solving linear and non-linear problems.

JuliaOpt libraries

The JuliaOpt GitHub organization is home to a number of optimization-related packages written in Julia.

Matlab Optimisation Toolbox (non free)

The Matlab optimisation toolbox contains various algorithms for performing optimisation and model fitting.

Optimisation - Probabilistic


Reversible-Jump Markov Chain Monte Carlo library developped at the Research School of Earth Sciences. See the Jupyter Notebook example here<\a>!</p>

emcee - The MCMC Hammer

emcee is an MIT licensed pure-Python implementation of Goodman & Weare’s Affine Invariant Markov chain Monte Carlo (MCMC) Ensemble sampler.


PyMC3 is a Python library allowing to perform MCMC calculations in Python.


Mamba is an open platform for the implementation and application of MCMC methods to perform Bayesian analysis in julia.


Stan® is a state-of-the-art platform for statistical modeling and high-performance statistical computation.

Deep Learning


Keras is a deep-learning library for Theano and TensorFlow, two open source libraries for numerical computations. Keras is focused on easy implementation of deep neural networks.


Lasagne is a lightweight library to build and train neural networks in Theano.


Mocha is a Deep Learning framework for Julia, inspired by the C++ framework Caffe.


Caffe is a Deep Learning framework in C++.



Spectra.jl is a library aimed at helping spectroscopic (Raman, Infrared, Nuclear Magnetic Resonance, XAS...) data treatment written in Julia.


Rampy is a Python library for helping treating spectroscopic (Raman, Infrared, Nuclear Magnetic Resonance, XAS...) data.


gcvspline is a small Python package wrapping the gcvspl.f FORTRAN library.



ViscoAG is a software written in Julia that allows calculating the viscosity of silicate melts based on the knowledge of their structure.


Python tools for processing Laser Ablation mass spectrometry data.

</div> </section>

Data science research at RSES

Under Construction