Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
data:start [2008/01/26 09:03]
marco
data:start [2008/03/30 13:05]
schtepf
Line 7: Line 7:
  
 ===== Ordered by task categories ===== ===== Ordered by task categories =====
- 
- 
  
 ==== Task 1: Free Association ==== ==== Task 1: Free Association ====
Line 18: Line 16:
  
  
- +**NB: On March 29th, we fixed a small (but serious) bug in script ''eval_task3.perl''. If you obtained a copy at an earlier time, please download the most recent version of the package and use it for your evaluation.**
  
 ==== Task 2: Categorization ==== ==== Task 2: Categorization ====
Line 35: Line 32:
   * [[Abstract/Concrete Nouns Discrimination|Abstract/Concrete Noun Discrimination]]   * [[Abstract/Concrete Nouns Discrimination|Abstract/Concrete Noun Discrimination]]
   * [[Verb Categorization]]   * [[Verb Categorization]]
 +
  
  
Line 47: Line 45:
   * [[Comparison with Speaker-Generated Features]]   * [[Comparison with Speaker-Generated Features]]
  
- +**NB: ON MARCH 7, WE MADE A SMALL CORRECTION TO THE PROPERTY EXPANSION FILE USED FOR THIS TASK; IF YOU DOWNLOADED THE RELEVANT ARCHIVE BEFORE THIS DATE, PLEASE DOWNLOAD IT AGAIN**
- +
- +
  
 ===== Source corpus ===== ===== Source corpus =====
  
 You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC. You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC.
-