Hands-on Distributional Semantics (ESSLLI 2021 / 2022)

Hands-on Distributional Semantics – From first steps to interdisciplinary applications ESSLLI 2022 (Galway) ESSLLI 2021 (online)
Foundational course at ESSLLI 2021, online, August 9–13, 2021

Hands-on Distributional Semantics for Linguistics using R
Foundational course at ESSLLI 2022, Galway, Ireland, August 8–12, 2022

  • update of all materials for the 2022 edition of the course has been completed
  • Thanks for attending our course! It's a pleasure working with you.

Course description

Distributional semantic models (DSM) – also known as “word space”, “distributional similarity”, or more recently “word embeddings” – are based on the assumption that the meaning of a word can (at least to a certain extent) be inferred from its usage, i.e. its distribution in text. Therefore, these models dynamically build semantic representations – in the form of high-dimensional vector spaces – through a statistical analysis of the contexts in which words occur. DSMs are a promising technique for solving the lexical acquisition bottleneck by unsupervised learning, and their distributed representation provides a cognitively plausible, robust and flexible architecture for the organisation and processing of semantic information.

In this introductory course we will highlight the interdisciplinary potential of DSM beyond standard semantic similarity tasks, with applications in cognitive modeling and theoretical linguistics. This course aims to equip participants with the background knowledge and skills needed to build different kinds of DSM representations – from traditional “count” models to neural word embeddings – and apply them to a wide range of tasks. The hands-on sessions will be conducted in R with the user-friendly wordspace package and various pre-built models.

Lecturers: Stephanie Evert (FAU Erlangen-Nürnberg) & Gabriella Lapesa (IMS, U Stuttgart)

Organizational information

Please make sure you have up-to-date versions of R and RStudio to participate in the hands-on exercises. Follow the detailed set-up instructions and download (some of) the data sets and precompiled DSMs. Additional instructions will be given in the first session on Monday. In particular, you will be asked to download and install the wordspaceEval package using a password provided in the course.

Schedule & handouts

Day 1: Introduction

presentation slides (PDF, 2.1 MB) – handout (PDF, 1.7 MB) – R code: hands_on_day1.R

  • motivation and geometric intuition
  • distributional vs. semantic similarity
  • outline of the course
  • practice: software setup, first steps with the wordspace package

Day 2: Building a DSM

presentation slides (PDF, 1.6 MB) – handout (PDF, 1.2 MB) – R code: hands_on_day2.R – bonus material: hands_on_day2_input_formats.R

  • formal definition of a DSM, taxonomy of parameters
  • collecting co-occurrence data: what counts as a context?
  • mathematical operations on DSM vectors
  • computing distances/similarities
  • practice: building DSMs and exploring different parameter settings

Day 3: Which aspects of meaning does a DSM capture?

presentation slides (PDF, 3.2 MB) – handout (PDF, 2.9 MB) – R code: hands_on_day3_exercise_1.R, hands_on_day3_exercise_2.R

  • evaluation: conceptual coordinates
  • standard evaluation tasks (multiple choice, correlation, clustering)
  • narrowing down similarity: classifying semantic relations
  • practice: evaluation of selected tasks

Day 4: DS beyond NLP – Linguistic theory

presentation slides (PDF, 3.6 MB) – handout (PDF, 3.5 MB) – R code: hands_on_day4.R – bonus material: schuetze1998.R

  • linguistic exploitation of DSM representations
  • a textbook challenge for DSMs: polysemy
  • success stories: semantic compositionality, morphological transparency, argument structure
  • issues: not all words have a distributional meaning
  • practice: different exercises with linguistic data sets

Day 5: DS beyond NLP – Cognitive modelling

presentation slides (PDF, 1.6 MB) – handout (PDF, 1.4 MB) – R code: hands_on_day5.R – bonus task: CogALex4.rda (0.2 MB) – bonus material: hands_on_day5_matrix_factorization.R

  • DSMs for cognitive modelling
  • free association norms as a window into the mental lexicon
  • predicting free associations with DSMs
  • practice: combining DSMs with first-order co-occurrence for the FAST task