Module manager: Serge Sharoff
Email: s.sharoff@leeds.ac.uk
Taught: Semester 2 (Jan to Jun) View Timetable
Year running 2026/27
MODL5007M
This module is not approved as an Elective
This module is aimed at studying how language is used from the perspective of data science. The basis for the study is provided by corpora, i.e. large databanks of texts in natural languages. Corpora are also commonly used to train Generative AI and Machine Translation. The aim of the module is to make students familiar with statistical methods for corpus exploration and to equip them with AI literacy skills. Please note this is an optional module and runs subject to enrolments. If a low number of students choose this module, then the module may not run and you may be asked to choose another module.
The overall goal of the module is to introduce data science for language with the specific focus on multilinguality. The more specific aims are: - to introduce the basic concepts and methods of data science and how they can be applied to language and translation; - to provide practical skills and tools for querying corpora and statistical interpretation of the results; - to make students better equipped with the background required for interaction with Generative AI and Machine Translation tools.
On successful completion of the module students will be able to:
LO1 describe basic types of corpora
LO2 Appraise and apply principles of corpus querying
LO3 Apply relevant statistical methods
LO4 Design Python scripts to collect and process their own specialised corpora.
On successful completion of the module students will have demonstrated the following skills learning outcomes:
SO1. Develop reflection and critical thinking to understand the nature of language use from a statistical point of view.
SO2. Develop skills in identifying and solving potential challenges by searching for information
| Delivery type | Number | Length hours | Student hours |
|---|---|---|---|
| Supervision | 4 | 1 | 4 |
| Lecture | 8 | 1 | 8 |
| Seminar | 8 | 1 | 8 |
| Private study hours | 130 | ||
| Total Contact hours | 20 | ||
| Total hours (100hr per 10 credits) | 150 | ||
Regular weekly feedback on progress with the case study.
| Assessment type | Notes | % of formal assessment |
|---|---|---|
| Coursework | Case Study | 100 |
| Total percentage (Assessment Coursework) | 100 | |
Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated
Check the module area in Minerva for your reading list
Last updated: 30/04/2026
Errors, omissions, failed links etc should be notified to the Catalogue Team