Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
data:esslli2008:correlation_with_free_association_norms [2008/06/23 00:00]
127.0.0.1 external edit
data:esslli2008:correlation_with_free_association_norms [2010/11/01 14:07] (current)
Line 1: Line 1:
-====== Correlation of the statistical distribution of words with human free associations ======+====== Task 1: Correlation with free association norms ======
  
  
Line 43: Line 43:
 ===== Data sets & tasks ===== ===== Data sets & tasks =====
  
-ZIP-archive with data sets for all subtasks: {{data:free_association_tasks.zip}}+ZIP-archive with data sets for all subtasks: {{free_association_tasks.zip}}
  
 All files are TAB-delimited tables in ASCII text format with a single header row, so they can easily be loaded into [[http://www.r-project.org/|R]] (with ''read.delim()'') and most spreadsheet programs.  Standard columns are ''cue'' (stimulus headword) and ''target'' (response headword); the other columns are specific for each task and are described below. All files are TAB-delimited tables in ASCII text format with a single header row, so they can easily be loaded into [[http://www.r-project.org/|R]] (with ''read.delim()'') and most spreadsheet programs.  Standard columns are ''cue'' (stimulus headword) and ''target'' (response headword); the other columns are specific for each task and are described below.
Line 108: Line 108:
 ===== Ancillary data: First-order associations ===== ===== Ancillary data: First-order associations =====
  
-Database of first-order statistical associations: {{data:lexsem08_first_order_associations.ds.gz}} (ZIP archive, 5.8 MB)+Database of first-order statistical associations: {{lexsem08_first_order_associations.ds.gz}} (ZIP archive, 5.8 MB)
  
 This database contains lemmatised surface collocates of all cue words used in the free associations task, extracted from the British National Corpus with a span size of 5 words (left & right) and limited by sentence boundaries.  Collocates were only included if they cooccur at least //f=5// times with the cue word and show significant evidence for a positive statistical association (//p < .001//, one-sided log-likelihood test).  First-order association is quantified by four well-known association measures with distinct mathematical properties, viz. //log-likelihood//, //t-score//, //MI// and //Dice//. See [[http://purl.org/stefan.evert/PUB/Evert2007HSK_extended_manuscript.pdf|Evert (2008)]] for terminology and further information. This database contains lemmatised surface collocates of all cue words used in the free associations task, extracted from the British National Corpus with a span size of 5 words (left & right) and limited by sentence boundaries.  Collocates were only included if they cooccur at least //f=5// times with the cue word and show significant evidence for a positive statistical association (//p < .001//, one-sided log-likelihood test).  First-order association is quantified by four well-known association measures with distinct mathematical properties, viz. //log-likelihood//, //t-score//, //MI// and //Dice//. See [[http://purl.org/stefan.evert/PUB/Evert2007HSK_extended_manuscript.pdf|Evert (2008)]] for terminology and further information.
Line 133: Line 133:
 **NB: bug in script eval_task3.perl fixed as of March 29: if you downloaded earlier, please re-download**  **NB: bug in script eval_task3.perl fixed as of March 29: if you downloaded earlier, please re-download** 
  
-Evaluation package: {{data:eval_package_free_association.zip}}+Evaluation package: {{eval_package_free_association.zip}}
  
   * sample output generated by FOO model ((**F**irst-**O**rder associations **O**nly))   * sample output generated by FOO model ((**F**irst-**O**rder associations **O**nly))
   * sample evaluation scripts written in [[http://www.r-project.org/|R]] and [[http://www.perl.org/|Perl]]   * sample evaluation scripts written in [[http://www.r-project.org/|R]] and [[http://www.perl.org/|Perl]]
   * includes complete implementation of FOO model   * includes complete implementation of FOO model