Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
software:start [2010/12/03 16:00]
schtepf
software:start [2018/07/26 12:00]
schtepf
Line 7: Line 7:
 \\ \\
 \\ \\
 +
 +
 +===== Off-the-shelf software packages for DSM =====
 +
 +**Python**
 +  * [[https://radimrehurek.com/gensim/|Gensim]] – high-performance topic modelling
 +  * [[http://vecto.space/|Vecto]] – a new framework for count & predict models
 +  * [[http://clic.cimec.unitn.it/composes/toolkit/|DISSECT]] – easy-to-use package developed by the COMPOSES project
 +  * [[https://pypi.org/project/Divisi/|Divisi]] – semantic networks, tensors & SVD ([[rewDivisi2|review]])
 +
 +**R**
 +  * [[http://wordspace.r-forge.r-project.org|wordspace]] – user-friendly DSM exploration
 +
 +**Java**
 +  * [[https://github.com/semanticvectors/semanticvectors/wiki|Semantic Vectors]] – scalable implementation based on random indexing ([[rewSemVector|review]])
 +  * [[https://github.com/fozziethebeat/S-Space|S-Space]] package ([[rewSSpacePackage|review]])
 +
 +**C/C++**
 +  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] – classical LSA-style DSM ([[rewInfoMap|review]])
 +  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer ([[hiDex|review]])
 +  * [[https://github.com/facebookresearch/fastText|FastText]] – state-of-the-art neural word embeddings
 +
 +**Other**
 +  * [[http://senseclusters.sourceforge.net/|SenseClusters]] – distributional clustering in Perl
 +  * [[http://scgroup20.ceid.upatras.gr:8000/tmg/|Text to Matrix Generator]] (TMG) – text mining with NMF in Matlab
 +
 +
 +//If you know other useful off-the-shelf packages missing from this list, please [[stefan.evert@fau.de|drop me a line]].//
 +
 +===== Precompiled DSMs =====
 +
 +FIXME
 +
 +
 +===== Evaluation tasks =====
 +
 +FIXME
 +
  
 ===== Useful corpora ===== ===== Useful corpora =====
  
-  * The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump.  The WaCky initiative offers a [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump.  Both corpora only cover the //English Wikipedia//.+FIXME
  
-===== Off-the-shelf packages for DSM =====+  * The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump.  The WaCky initiative offers [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump.  Both corpora only cover the //English Wikipedia//.
  
-  * [[GenSim]]: incremental SVD & LSA in python, easily deployable to clusters. 
-  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] 
-    * [[rewInfoMap|Review]] 
-  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer 
-    * [[hiDex|Review]] 
-  * [[http://code.google.com/p/semanticvectors|SemanticVectors]] 
-    * [[rewSemVector|Review]] 
-  * [[http://senseclusters.sourceforge.net/|SenseClusters]] 
-    * [[rewSenseClusters|Review]] 
-  * [[http://code.google.com/p/airhead-research/|S-Space Package]] (work in progress) 
-    *[[rewSSpacePackage|Review]] 
-  * [[http://code.google.com/p/wordspaces/|Wordspaces]] (interactive exploration) 
-    * [[rewWordSpaces|Review]] 
-  * [[http://csc.media.mit.edu/docs/divisi2|Divisi]] (semantic networks, tensors & SVD in Python) 
-    * [[rewDivisi2|Review]] 
-  * [[miscellaneous|Miscellaneous]]