Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
course:material [2021/08/07 18:00]
schtepf [Software for the course]
course:material [2021/08/11 16:12]
schtepf [Example data sets]
Line 17: Line 17:
     * ''sparsesvd'' (v0.2)     * ''sparsesvd'' (v0.2)
     * ''wordspace'' (v0.2-6)     * ''wordspace'' (v0.2-6)
-    * recommended: ''e1071'', ''text2vec'', ''Rtsne'', ''uwot''+    * recommended: ''e1071'', ''rsparse'', ''Rtsne'', ''uwot''
     * optional: ''tm'', ''quanteda'', ''data.table'', ''wordcloud'', ''shiny'', ''spacyr'', ''udpipe'', ''coreNLP'' (don't worry if some of these fail to install)     * optional: ''tm'', ''quanteda'', ''data.table'', ''wordcloud'', ''shiny'', ''spacyr'', ''udpipe'', ''coreNLP'' (don't worry if some of these fail to install)
 +    * optional: ''NMF'' (also install ''biocManager'', then run command ''BiocManager::install("bioBase")''
   - During the course, you will be asked to install a further package with additional evaluation tasks (''wordspaceEval'') from a password-protected Web page:   - During the course, you will be asked to install a further package with additional evaluation tasks (''wordspaceEval'') from a password-protected Web page:
     * ''wordspaceEval'' v0.2: [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tar.gz|Source/Linux]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tgz|MacOS]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.zip|Windows]] (login required)     * ''wordspaceEval'' v0.2: [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tar.gz|Source/Linux]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.tgz|MacOS]] – [[http://www.collocations.de/data/protected/wordspaceEval_0.2.zip|Windows]] (login required)
Line 26: Line 27:
   - Download one or more of the pre-compiled DSMs listed below   - Download one or more of the pre-compiled DSMs listed below
  
-/-- doesn't apply at the moment -- +===== Scaling R to large data sets ===== 
 + 
 +Most of our hands-on examples work reasonably well in a standard R installation, even on a moderately powerful laptop computer. 
 +However, if you intend to work on real-life tasks and process large DSMs, it is important to enable multi-threaded computation 
 +in R. Since DSMs build on matrix operations, a multi-threaded linear algebra library (“BLAS”) is key. 
 + 
 +  - In Linux, it should be sufficient to install the OpenBLAS package, e.g. in Ubuntu: ''sudo apt install libopenblas-dev'' 
 +  - In MacOS, follow [[https://groups.google.com/g/r-sig-mac/c/YN6uNYCIZK0|these instructions]] to enable the VecLib BLAS built into MacOS.  You may also want to [[https://mac.r-project.org/openmp/|enable OpenMP]] for an additional speed boost on expensive distance metrics (but this is less important). 
 +  - In Windows, you can try installing [[https://mran.microsoft.com/open|Microsoft R Open]] or do a Web search for alternative solutions. 
 + 
 + 
 +<!-- doesn't apply at the moment --  
 ==== Getting the latest & greatest ==== ==== Getting the latest & greatest ====
  
Line 41: Line 54:
 You can also check the [[http://wordspace.r-forge.r-project.org/|wordspace homepage]] for new releases and installation instructions. You can also check the [[http://wordspace.r-forge.r-project.org/|wordspace homepage]] for new releases and installation instructions.
  
-*/+-->
  
 ===== Example data sets ===== ===== Example data sets =====
Line 50: Line 63:
   * ''[[http://www.collocations.de/data/potter_l2r2.txt.gz|potter_l2r2.txt.gz]]'' (51.3 MB)   * ''[[http://www.collocations.de/data/potter_l2r2.txt.gz|potter_l2r2.txt.gz]]'' (51.3 MB)
   * ''[[http://www.collocations.de/data/potter_lemmas.txt.gz|potter_lemmas.txt.gz]]'' (1.1 MB)    * ''[[http://www.collocations.de/data/potter_lemmas.txt.gz|potter_lemmas.txt.gz]]'' (1.1 MB) 
 +  * ''[[http://www.collocations.de/data/VSS.txt|VSS.txt]]'' (37 kB)
  
 ===== Pre-compiled DSMs ===== ===== Pre-compiled DSMs =====