Differences

This shows you the differences between two versions of the page.

--- data:start [2008/01/26 09:03]
marco
+++ data:start [2008/06/18 15:46]
schtepf
@@ Line 7: / Line 7: @@
 ===== Ordered by task categories =====
 ==== Task 1: Free Association ====
@@ Line 16: / Line 14: @@
   * [[Correlation with Free Association Norms]]
@@ Line 35: / Line 30: @@
   * [[Abstract/Concrete Nouns Discrimination|Abstract/Concrete Noun Discrimination]]
   * [[Verb Categorization]]
@@ Line 46: / Line 42: @@
   * [[Comparison with Speaker-Generated Features]]
@@ Line 55: / Line 47: @@
 You can train your word space on your favorite corpus. However, we also invite you, if this is suitable, to experiment with the [[http://wacky.sslmit.unibo.it|ukWaC]] corpus, so that we will be able to compare different word spaces trained on the same corpus (for information on how to obtain the corpus, write to [[wacky@sslmit.unibo.it|this address]]). ukWaC is a very large (about 2 billion tokens) Web-derived corpus. It is split into sub-sections containing randomly chosen documents. Thus, if your algorithm has problems scaling up to 2 billion tokens, you can train it on one or more sub-sections, that will constitute a document-based random sub-sample of ukWaC.

You are here: start » data