Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
software:start [2010/11/01 14:30]
schtepf [DSM Software and Data Sets]
software:start [2018/07/26 12:00]
schtepf
Line 8: Line 8:
 \\ \\
  
-===== Off-the-shelf packages for DSM ===== 
  
-  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] +===== Off-the-shelf software packages for DSM ===== 
-    * [[rewInfoMap|Review]] + 
-  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer +**Python** 
-    * [[hiDex|Review]] +  * [[https://radimrehurek.com/gensim/|Gensim]] – high-performance topic modelling 
-  * [[http://code.google.com/p/semanticvectors|SemanticVectors]] +  * [[http://vecto.space/|Vecto]] – a new framework for count & predict models 
-    [[rewSemVector|Review]] +  * [[http://clic.cimec.unitn.it/composes/toolkit/|DISSECT]] – easy-to-use package developed by the COMPOSES project 
-  * [[http://senseclusters.sourceforge.net/|SenseClusters]] +  * [[https://pypi.org/project/Divisi/|Divisi]] – semantic networks, tensors & SVD ([[rewDivisi2|review]]) 
-  * [[http://code.google.com/p/airhead-research/|S-Space Package]] (work in progress) + 
-  * [[http://code.google.com/p/wordspaces/|Wordspaces]] (interactive exploration) +**R** 
-  [[http://divisi.media.mit.edu/|Divisi]] (semantic networks, tensors & SVD in Python)+  * [[http://wordspace.r-forge.r-project.org|wordspace]] – user-friendly DSM exploration 
 + 
 +**Java** 
 +  * [[https://github.com/semanticvectors/semanticvectors/wiki|Semantic Vectors]] – scalable implementation based on random indexing ([[rewSemVector|review]]) 
 +  * [[https://github.com/fozziethebeat/S-Space|S-Space]] package ([[rewSSpacePackage|review]]) 
 + 
 +**C/C++** 
 +  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] – classical LSA-style DSM ([[rewInfoMap|review]]) 
 +  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer ([[hiDex|review]]) 
 +  * [[https://github.com/facebookresearch/fastText|FastText]] – state-of-the-art neural word embeddings 
 + 
 +**Other** 
 +  * [[http://senseclusters.sourceforge.net/|SenseClusters]] – distributional clustering in Perl 
 +  * [[http://scgroup20.ceid.upatras.gr:8000/tmg/|Text to Matrix Generator]] (TMG) – text mining with NMF in Matlab 
 + 
 + 
 +//If you know other useful off-the-shelf packages missing from this list, please [[stefan.evert@fau.de|drop me a line]].// 
 + 
 +===== Precompiled DSMs ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Evaluation tasks ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Useful corpora ===== 
 + 
 +FIXME 
 + 
 +  * The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleanedWikipedia Corpus]] from an April 2010 dump.  The WaCky initiative offers [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump.  Both corpora only cover the //English Wikipedia//