2025/26 Undergraduate Module Catalogue

LING2065 Data Science for Linguists

20 Credits Class Size: 18

Module manager: Cecile De Cat
Email: c.decat@leeds.ac.uk

Taught: Semester 1 (Sep to Jan) View Timetable

Year running 2025/26

Pre-requisite qualifications

MODL1060 or equivalent introductory linguistics module

Pre-requisites

MODL1060 Language: Structure and Sound

This module is approved as a discovery module

Module summary

This module aims to equip you with the knowledge and skills required to address the following questions: What kinds of data inform linguistic research? How are these generated? How can we transform, visualise, and analyse these data to identify interesting patterns, and thereby gain a deeper understanding of the information contained in the data? Through demonstrations and hands-on activities using linguistic data, you will learn the basics of data manipulation and analysis. Please note this is an optional module and runs subject to enrolments. If a low number of students choose this module, then the module may not run and you may be asked to choose another module.

Objectives

In this module students will learn:

- to navigate the IT environment required to work with electronic data. This will include organising data, setting up folder structures, and being able to use R, a versatile tool which is freely downloadable, very widely used, and very well supported by a large community of data scientists across the world.
- to understand the different types of data one can encounter in linguistic research. This will include understanding the different types of variables that are used in statistics.
- to prepare data to make it suitable for analysis. This includes data cleaning, reorganisation, and transformation.
- to describe a dataset so it is properly documented.
- to visualise data and understand how it is distributed.
- to understand how variables can relate with each other; to be able to visualise and interpret different types of relationship between variables.
- to estimate whether a difference observed visually is statistically meaningful, on the basis of statistical model summaries.

In classes, you will be guided through demonstrations and practical exercises based on linguistic datasets (e.g., experimental data, corpus data, questionnaire data). The demonstrations and exercises will be done in R.

Learning outcomes

On successful completion of the module students will be able to:

LO1. describe linguistic datasets and visualise the distribution of variables
LO2. identify how the available data can be used to answer a specific research question, and provide visualisations to inform a preliminary analysis;
LO3. understand the basics of regression analysis. This includes being able to explain in simple terms how it works and when it is useful, and to be able to interpret the results of a given model, in relation to a research question in linguistics.

Skills Learning Outcomes

On successful completion of the module students will be able to:

SO1. use R to prepare and visualise data
SO2. interpret the relationship between variables in a dataset.
SO3. interpret the results of a simple regression analysis.

Syllabus

Details of the syllabus will be provided on the Minerva organisation (or equivalent) for the module

Teaching Methods

Delivery type Number Length hours Student hours
Lecture 10 1 10
Practical 10 1 10
Independent online learning hours 20
Private study hours 160
Total Contact hours 20
Total hours (100hr per 10 credits) 200

Opportunities for Formative Feedback

As this module aims to develop practical skills, it will be heavily based on demonstrations and hands-on activities. Students will learn by doing, and will be expected to be proactive in the monitoring of their understanding. Independent self-study will be essential, as knowledge and skills will be acquired incrementally. The lectures and practicals will be interactive, and will aim to provide ample opportunities for feedback and support.

- Interactive tools will be used during lectures to monitor student understanding.
- Exercises will be set weekly in preparation for the practicals. Feedback and advice will be provided during the practicals.
- There will be two pieces of formative assessment (one before each summative assessment).

Methods of Assessment

Coursework
Assessment type Notes % of formal assessment
Coursework Presentation 50
Coursework Online Time Limited Assessment, 2 hours 50
Total percentage (Assessment Coursework) 100

Normally resits will be assessed by the same methodology as the first attempt, unless otherwise stated

Reading List

Check the module area in Minerva for your reading list

Last updated: 02/05/2025

Errors, omissions, failed links etc should be notified to the Catalogue Team