Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
data:start [2008/03/07 10:07]
marco
data:start [2008/06/18 15:46]
schtepf
Line 7: Line 7:
  
 ===== Ordered by task categories ===== ===== Ordered by task categories =====
- 
- 
  
 ==== Task 1: Free Association ==== ==== Task 1: Free Association ====
Line 16: Line 14:
  
   * [[Correlation with Free Association Norms]]   * [[Correlation with Free Association Norms]]
- 
- 
- 
  
  
Line 48: Line 43:
   * [[Comparison with Speaker-Generated Features]]   * [[Comparison with Speaker-Generated Features]]
  
-**NB: ON MARCH 7, WE MADE A SMALL CORRECTION TO THE PROPERTY EXPANSION FILE USED FOR THIS TASK; IF YOU DOWNLOADED THE RELEVANT ARCHIVE BEFORE THIS DATE, PLEASE DOWNLOAD IT AGAIN** 
  
 ===== Source corpus ===== ===== Source corpus =====
  
 You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC. You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC.
-