This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
data:start [2008/01/20 15:46]
data:start [2010/02/08 00:50]
Line 1: Line 1:
 ====== Data sets for the evaluation of word space models ====== ====== Data sets for the evaluation of word space models ======
-This page contains a developing list of tasks, sub-tasks and corresponding (sub-)data-sets. 
-Other tasks or sub-tasks might be added in the near future.+===== Ordered by events ===== 
 +  * [[:​data:​esslli2008:​start|ESSLLI 2008 Shared Tasks]] (for the [[:​workshop:​esslli:​start|Workshop on Distributional Lexical Semantics]]) ​
Line 9: Line 10:
 +==== Semantic classification ====
 +  * Noun & verb categorization (ESSLLI 2008)
 +    * [[:​data:​esslli2008:​Concrete Noun Categorization|semantic categories of concrete nouns]]
 +    * [[:​data:​esslli2008:​Abstract/​Concrete Nouns Discrimination|abstract vs. concrete nouns]]
 +    * [[:​data:​esslli2008:​Verb Categorization|Levin-style verb classes]]
 +==== Free association ====
-==== Task 1: Categorization ==== +  ​* [[:data:​esslli2008:​Correlation with Free Association ​Norms|Correlation with free association ​norms]] ​(ESSLLI 2008) 
- +    ​discrimination:​ strong ​association ​vsnon-associated 
-Categorization tasks play a prominent role in cognitive research on concepts. In this type of tasks, subjects +    correlationregression modelling ​of association strength 
-are typically asked to assign experimental items - objects, images, words - +    prediction of most common responses ​(strongest associations)
-to a given category or to group together items belonging to the same category. +
-Since categorization presupposes an understanding of the relationship between the items in a category, it is regarded as a key source of evidence on the organization and structure of the human conceptual system. +
- +
-In the present task, computational models will be tested on their ability to properly group +
-words into semantic categories. The task is organized into three sub-tasks, focussing on different areas +
-of the lexicon and/or semantic dimensions:​ +
- +
-  ​* [[Concrete Noun Categorization]] +
-  * [[Abstract/​Concrete Noun Discrimination]] +
-  * [[Verb Categorization]] +
- +
- +
- +
- +
-==== Task 2: Free Association ​==== +
- +
-It is tempting to make a connection between the **statistical ​association** patterns of words --  first-order ​(//​collocations//​as well as higher order (//word space//) -- and **human free associations** -- the first words that come to mind when native speakers are presented with a stimulus word.  In this  task, we will explore to what extent such free associations can be explained and predicted by statistically salient patterns in the linguistic experience of speakers, possibly offering a simple and straightforward interpretation of distributional similarity (i.e. higher-order ​association), in contrast to the symbolic aspects of meaning targeted by the other tasks ​However,​ this is not merely a "​baseline"​ task: it also touches on intriguing research problems such as the interaction of first-order and higher-order information in human associative memory. +
- +
- +
-  [[Correlation with Free Association Norms]] +
- +
-==== Task 3Property Generation ==== +
- +
-The ability to describe a concept in terms of its salient properties is an important feature of human conceptual cognition. In this task, we compare human-generated //norms// collected by psychologists to the properties generated by computational models. +
- +
-  [[Comparison with Speaker-Generated Features]] ​(**preliminary gold standard and evaluation script available!**) +
- +
 +==== Property generation ====
 +  * [[:​data:​esslli2008:​Comparison with Speaker-Generated Features|Prediction of speaker-generated semantic features]]
-===== Source corpus ===== 
-You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://​wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]).