Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
data:start [2008/01/26 09:03]
marco
data:start [2008/06/18 15:46]
schtepf
Line 7: Line 7:
  
 ===== Ordered by task categories ===== ===== Ordered by task categories =====
- 
- 
  
 ==== Task 1: Free Association ==== ==== Task 1: Free Association ====
Line 16: Line 14:
  
   * [[Correlation with Free Association Norms]]   * [[Correlation with Free Association Norms]]
- 
- 
- 
  
  
Line 35: Line 30:
   * [[Abstract/Concrete Nouns Discrimination|Abstract/Concrete Noun Discrimination]]   * [[Abstract/Concrete Nouns Discrimination|Abstract/Concrete Noun Discrimination]]
   * [[Verb Categorization]]   * [[Verb Categorization]]
 +
  
  
Line 46: Line 42:
  
   * [[Comparison with Speaker-Generated Features]]   * [[Comparison with Speaker-Generated Features]]
- 
- 
- 
- 
  
  
Line 55: Line 47:
  
 You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC. You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC.
-