2025/26 Taught Postgraduate Module Catalogue

CHEM5910M Data Science for Digital Chemistry

15 Credits Class Size: 50

Module manager: Dr Stuart Warriner
Email: s.l.warriner@leeds.ac.uk

Taught: Semester 1 (Sep to Jan) View Timetable

Year running 2025/26

Pre-requisite qualifications

A background in Chemistry equivalent to year 2 undergraduate level or selection of Foundations of Chemistry optional module in Semester 1

Mutually Exclusive

CHEM3212 Big Data, Big Science

This module is not approved as an Elective

Module summary

The explosion of information and data require people to handle large datasets efficiently and quickly. In science new insights often involve taking lots of data and bringing it together in a way that illuminates the problem or interactively using data to solve problems. In this course students will develop the core skills to efficiently handle large datasets. Using examples from across Chemistry students will see how to efficiently extract data using simple programming in python and reach meaningful conclusions. Online tools will help students acquire key skills while regular computer workshops will let them explore real examples, enabling them to use these skills to answer scientific questions.

Objectives

To enable students to explore how to handle large datasets to extract and process key scientific information and to display and interpret the results of their analysis.

Learning outcomes

On successful completion of the module students will have demonstrated the following learning outcomes relevant to the subject:

1) Be able to give examples of where data science can be employed in scientific research.
2) Understand ethical considerations in data handling and mechanisms to control these risks
3) Set-up and use different environments to execute Python code
4) Read data files into a variety of data structures including lists arrays and data frames
5) Process, merge, slice and aggregate data to gain new insights and perform mathematical manipulations
6) Perform basic cheminformatics and dimensionality reduction techniques
7) Present data in a variety of graphical forms to show trends and data distribution and statistics

Skills Learning Outcomes

On successful completion of the module students will have demonstrated the following skills:

A) Use computational methods for efficient data analysis
B) Write computer code which is portable and well commented and maintainable in a professional environment
C) Explain the results of a data analysis exercise in a visually appealing manner that is accessible to a non-expert

Syllabus

Examples of responsible uses of data and how data usage can be misused or produce undesired and distorted results. Managing data risks through risk assessment.

- ipython notebooks as a tool for interactive data analysis. use in notebook environments such as Jupyter and vscode.
- Creating bespoke python environments using conda and pip to provide a configurable and controllable set of packages
- Core python data-structures.
- 2D data storage in pandas dataframes.
- Plotting of data using matplotlib to provide scatter, line and other key types of graph. Statistical plots using seaborn and interactive plots with plotly.
- Molecular structure representation using different encoding formats.

Methods of assessment
The assessment details for this module will be provided at the start of the academic year

Teaching Methods

Delivery type Number Length hours Student hours
Computer Class 10 2 20
Lecture 1 1 1
Seminar 10 1 10
Independent online learning hours 27
Private study hours 92
Total Contact hours 31
Total hours (100hr per 10 credits) 150

Opportunities for Formative Feedback

The workshop sessions will involve guided solutions to the project with a member of staff enabling feedback on the approach being taken and any technical issues.

The online learning has self-help exercises to enable the students to monitor their own progress.

Reading List

There is no reading list for this module

Last updated: 30/04/2025

Errors, omissions, failed links etc should be notified to the Catalogue Team