2026/27 Undergraduate Module Catalogue

MATH3604 Data Curation and Governance

20 Credits Class Size: 60

Module manager: TBC
Email: TBC

Taught: Semester 2 (Jan to Jun) View Timetable

Year running 2026/27

Pre-requisite qualifications

Data Science at the level of MATH1603, MATH1604, MATH2603, MATH2604

Pre-requisites

MATH2603 Graphs, Networks and Systems
MATH2604 Machine Learning and Object-Oriented Programming

Module replaces

N/A

This module is not approved as a discovery module

Module summary

This module considers practical aspects of working with real-world big data in a professional setting. It examines the need for data curation, quality and storage and explores appropriate tools like databases and cloud technology. Learners will practise the important real-world task of data wrangling and curation through cleaning, processing, transformation etc and think about organisational data modelling. Professional settings also require an understanding of legal and ethical obligations surrounding privacy and data regulations/directives/compliance.

Objectives

The key objectives of this module are to - prepare learners for the professional requirements of real-world data science in an organisational context. - develop an awareness of the professional regulation landscape around data governance, such as ethics, legal compliance and required professional competencies. - prepare learners for the reality of the workplace where a lot of time is spent on data wrangling and curation before the machine learning engineering stages. - be able to make appropriate use of data storage and data modelling. - build on computational modules and scale up the methods of other modules to the big data requirements of the workplace. - prepare the learners for the demonstration of key competencies for job applications and for use in the workplace.

Learning outcomes

On successful completion of the module students will be able to: a) Clearly communicate trade-offs (e.g. technical data systems’ costs, capabilities and limitations, sustainability, environmental footprint), e.g. to organisational decision makers. b) Critically reflect on the role of data modelling, data projects and data scientists in organisational contexts. c) Articulate the key professional competencies of a data scientist in an organisational context.

Syllabus

1. Data modelling and data storage: databases e.g. SQL and NoSQL, on-premise, distributed, cloud technologies such as cloud-storage and cloud-computing (e.g. data lake, data bricks, Hadoop, Microsoft Azure), vector databases, high-performance computing, edge computing, networking; applications such as the Internet of Things, smart cities and sustainability 2. Data governance: privacy and ethics, ethical and legal obligations such as regulations/directives and compliance; data provenance and quality control; ownership, security, sensitivity 3. Data curation: data wrangling, cleaning, pre-processing, transforming, feature extraction/engineering etc 4. Business context: practical considerations such as costs, scalability, resource constraints, sustainability, environmental footprint, commercial products and services, knowledge transfer, communication with stakeholders

Teaching Methods

Delivery type Number Length hours Student hours
Practicals 22 2 44
Independent online learning hours 26
Private study hours 130
Total Contact hours 44
Total hours (100hr per 10 credits) 200

Opportunities for Formative Feedback

Formative learning opportunities in the studio-style class as well as formative opportunities to practise the skills that will be summatively assessed.

Reading List

Check the module area in Minerva for your reading list

Last updated: 12/05/2026

Errors, omissions, failed links etc should be notified to the Catalogue Team