Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
course:material [2018/08/24 16:59]
schtepf [Neural word embeddings]
course:material [2022/08/07 18:46] (current)
schtepf [Software for the course]
Line 1: Line 1:
 ====== Courses and Tutorials on DSM  ====== ====== Courses and Tutorials on DSM  ======
  
-[[course:esslli2009:start|ESSLLI '09]] –+[[course:esslli2009:start|ESSLLI 2009]] –
 [[course:acl2010:start|NAACL-HLT 2010]] – [[course:acl2010:start|NAACL-HLT 2010]] –
 [[course:esslli2018:start|ESSLLI '16 & '18]] – [[course:esslli2018:start|ESSLLI '16 & '18]] –
 +[[course:esslli2021:start|ESSLLI 2021]] –
 **Software & data sets** – **Software & data sets** –
 [[course:bibliography|Bibliography]] [[course:bibliography|Bibliography]]
Line 12: Line 13:
 Practical examples and exercises for these courses and tutorials are based on the user-friendly software package [[http://wordspace.r-forge.r-project.org/|wordspace]] for the interactive statistical computing environment [[http://www.r-project.org/|R]].  If you want to follow along, please bring your own laptop and set up the required software as follows: Practical examples and exercises for these courses and tutorials are based on the user-friendly software package [[http://wordspace.r-forge.r-project.org/|wordspace]] for the interactive statistical computing environment [[http://www.r-project.org/|R]].  If you want to follow along, please bring your own laptop and set up the required software as follows:
  
-  - Install up-to-date versions of [[https://cran.r-project.org/banner.shtml|R]] and the [[https://www.rstudio.com/products/rstudio/download/#download|RStudio]] GUI+  - Install up-to-date versions of [[https://cran.r-project.org/banner.shtml|R]] (4.0 or newer) and the [[https://www.rstudio.com/products/rstudio/download/#download|RStudio]] GUI
   - Use the installer built into RStudio (or the standard R GUI) to install the following packages from the CRAN archive:    - Use the installer built into RStudio (or the standard R GUI) to install the following packages from the CRAN archive: 
-    * ''sparsesvd'' +    * ''sparsesvd'' (v0.2) 
-    * ''wordspace'' +    * ''wordspace'' (v0.2-6) 
-    * optional: ''tm'', ''quanteda'', ''Rtsne'', ''shiny''+    * recommended: ''e1071'', ''rsparse'', ''Rtsne'', ''uwot'' 
 +    * optional: ''tm'', ''quanteda'', ''data.table'', ''wordcloud'', ''shiny'', ''spacyr'', ''udpipe'', ''coreNLP'' (don't worry if some of these fail to install) 
 +    * optional: ''NMF'' (also install ''biocManager'', then run command ''BiocManager::install("bioBase")'')
   - During the course, you will be asked to install a further package with additional evaluation tasks (''wordspaceEval'') from a password-protected Web page:   - During the course, you will be asked to install a further package with additional evaluation tasks (''wordspaceEval'') from a password-protected Web page:
-    * ''wordspaceEval'' v0.1: [[http://www.collocations.de/data/protected/wordspaceEval_0.1.tar.gz|Source/Linux]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.1.tgz|MacOS]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.1.zip|Windows]] (login required)+    * ''wordspaceEval'' v0.2: [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tar.gz|Source/Linux]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tgz|MacOS]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.zip|Windows]] (login required) 
 +    * if you are stuck with R v3.x, please use the older package version 0.1: [[http://www.collocations.de/data/protected/wordspaceEval_0.1.tar.gz|Source/Linux]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.1.tgz|MacOS]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.1.zip|Windows]] (login required)
     * download a suitable version and select “Install from: Package Archive File” in RStudio     * download a suitable version and select “Install from: Package Archive File” in RStudio
   - Download the sample data files listed below   - Download the sample data files listed below
   - Download one or more of the pre-compiled DSMs listed below   - Download one or more of the pre-compiled DSMs listed below
 +
 +===== Scaling R to large data sets =====
 +
 +Most of our hands-on examples work reasonably well in a standard R installation, even on a moderately powerful laptop computer.
 +However, if you intend to work on real-life tasks and process large DSMs, it is important to enable multi-threaded computation
 +in R. Since DSMs build on matrix operations, a multi-threaded linear algebra library (“BLAS”) is key.
 +
 +  - In Linux, it should be sufficient to install the OpenBLAS package, e.g. in Ubuntu: ''sudo apt install libopenblas-dev''
 +  - In MacOS, follow [[https://groups.google.com/g/r-sig-mac/c/YN6uNYCIZK0|these instructions]] to enable the VecLib BLAS built into MacOS.  You may also want to [[https://mac.r-project.org/openmp/|enable OpenMP]] for an additional speed boost on expensive distance metrics (but this is less important).
 +  - In Windows, you can try installing [[https://mran.microsoft.com/open|Microsoft R Open]] or do a Web search for alternative solutions.
 +
 +
 +<!-- doesn't apply at the moment -- 
  
 ==== Getting the latest & greatest ==== ==== Getting the latest & greatest ====
Line 36: Line 53:
  
 You can also check the [[http://wordspace.r-forge.r-project.org/|wordspace homepage]] for new releases and installation instructions. You can also check the [[http://wordspace.r-forge.r-project.org/|wordspace homepage]] for new releases and installation instructions.
 +
 +-->
  
 ===== Example data sets ===== ===== Example data sets =====
Line 44: Line 63:
   * ''[[http://www.collocations.de/data/potter_l2r2.txt.gz|potter_l2r2.txt.gz]]'' (51.3 MB)   * ''[[http://www.collocations.de/data/potter_l2r2.txt.gz|potter_l2r2.txt.gz]]'' (51.3 MB)
   * ''[[http://www.collocations.de/data/potter_lemmas.txt.gz|potter_lemmas.txt.gz]]'' (1.1 MB)    * ''[[http://www.collocations.de/data/potter_lemmas.txt.gz|potter_lemmas.txt.gz]]'' (1.1 MB) 
 +  * ''[[http://www.collocations.de/data/VSS.txt|VSS.txt]]'' (37 kB)
  
 ===== Pre-compiled DSMs ===== ===== Pre-compiled DSMs =====