Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
data:start [2008/03/07 10:07]
marco
data:start [2008/03/30 13:05]
schtepf
Line 7: Line 7:
  
 ===== Ordered by task categories ===== ===== Ordered by task categories =====
- 
- 
  
 ==== Task 1: Free Association ==== ==== Task 1: Free Association ====
Line 18: Line 16:
  
  
- +**NB: On March 29th, we fixed a small (but serious) bug in script ''eval_task3.perl''. If you obtained a copy at an earlier time, please download the most recent version of the package and use it for your evaluation.**
  
 ==== Task 2: Categorization ==== ==== Task 2: Categorization ====
Line 53: Line 50:
  
 You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC. You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC.
-