Courses and Tutorials on DSM

ESSLLI '09NAACL-HLT 2010Downloads & LinksBibliography

Online access (Web interfaces)

Off-the-shelf packages for DSM


Data sets

  • Verb + object noun co-occurrences (tokens) extracted from the British National Corpus: bnc_vobj_filtered.txt.gz (15 MB)
  • A 5-million word corpus of Harry Potter fan fiction in lemma_pos format (pre-cleaned): potter_tokens.txt.gz (8.9 MB)
  • NEW: DSM for 34,150 English nouns from 2-billion-word ukWaC corpus: ukwac_vobj_S_svd.rda (158 MB)
    • verb-object co-occurrences, features are 3,371 frequent verbs, log-scaled t-score, 300 SVD dimensions
    • nearest-neighbour demo with visualisation: neighbour_demo.R