Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:start [2010/11/30 15:46]
maebert [DSM Software and Data Sets]
software:start [2018/08/06 12:06] (current)
schtepf [Off-the-shelf software packages for DSM]
Line 1: Line 1:
 ====== DSM Software and Data Sets ====== ====== DSM Software and Data Sets ======
  
-* The Westbury Lab at Alberta has a [[http://​www.psych.ualberta.ca/​~westburylab/​downloads/​westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump. +{{:​under_construction.png?​48 |Under Construction}} 
-===== Off-the-shelf packages for DSM =====+ 
 +\\ 
 +**This page is under construction.** ​  
 +\\ 
 +\\ 
 + 
 + 
 +===== Off-the-shelf software packages for DSM ===== 
 + 
 +**Python** 
 +  * [[https://​radimrehurek.com/​gensim/​|Gensim]] – high-performance topic modelling 
 +  * [[http://​vecto.space/​|Vecto]] – a new framework for count & predict models 
 +  * [[http://​clic.cimec.unitn.it/​composes/​toolkit/​|DISSECT]] – easy-to-use package developed by the COMPOSES project 
 +  * [[https://​pypi.org/​project/​Divisi/​|Divisi]] – semantic networks, tensors & SVD ([[rewDivisi2|review]]) 
 + 
 +**R** 
 +  * [[http://​wordspace.r-forge.r-project.org|wordspace]] – user-friendly DSM exploration 
 + 
 +**Java** 
 +  * [[https://​github.com/​semanticvectors/​semanticvectors/​wiki|Semantic Vectors]] – scalable implementation based on random indexing ([[rewSemVector|review]]) 
 +  * [[https://​github.com/​fozziethebeat/​S-Space|S-Space]] package ([[rewSSpacePackage|review]]) 
 +  * [[http://​maggie.lt.informatik.tu-darmstadt.de/​jobimtext|JoBimText]] – with support for distributed processing 
 + 
 +**C/C++** 
 +  * [[http://​infomap-nlp.sourceforge.net/​|Infomap NLP]] – classical LSA-style DSM ([[rewInfoMap|review]]) 
 +  * [[http://​www.psych.ualberta.ca/​~westburylab/​downloads/​HiDEx.download.html|HiDEx]],​ the High-Dimensional Explorer ([[hiDex|review]]) 
 +  * [[https://​github.com/​facebookresearch/​fastText|FastText]] – state-of-the-art neural word embeddings 
 + 
 +**Other** 
 +  * [[http://​senseclusters.sourceforge.net/​|SenseClusters]] – distributional clustering in Perl 
 +  * [[http://​scgroup20.ceid.upatras.gr:​8000/​tmg/​|Text to Matrix Generator]] (TMG) – text mining with NMF in Matlab 
 + 
 + 
 +//If you know other useful off-the-shelf packages missing from this list, please [[stefan.evert@fau.de|drop me a line]].// 
 + 
 +===== Precompiled DSMs ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Evaluation tasks ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Useful corpora ===== 
 + 
 +FIXME 
 + 
 +  ​* The Westbury Lab at Alberta has a [[http://​www.psych.ualberta.ca/​~westburylab/​downloads/​westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump.  The WaCky initiative offers [[http://​wacky.sslmit.unibo.it/​doku.php?​id=corpora|WaCkypedia,​ a dependency-parsed Wikipedia Corpus]] from a 2009 dump.  Both corpora only cover the //English Wikipedia//​.
  
-  * [[http://​infomap-nlp.sourceforge.net/​|Infomap NLP]] 
-    * [[rewInfoMap|Review]] 
-  * [[http://​www.psych.ualberta.ca/​~westburylab/​downloads/​HiDEx.download.html|HiDEx]],​ the High-Dimensional Explorer 
-    * [[hiDex|Review]] 
-  * [[http://​code.google.com/​p/​semanticvectors|SemanticVectors]] 
-    * [[rewSemVector|Review]] 
-  * [[http://​senseclusters.sourceforge.net/​|SenseClusters]] 
-    * [[rewSenseClusters|Review]] 
-  * [[http://​code.google.com/​p/​airhead-research/​|S-Space Package]] (work in progress) 
-    *[[rewSSpacePackage|Review]] 
-  * [[http://​code.google.com/​p/​wordspaces/​|Wordspaces]] (interactive exploration) 
-    * [[rewWordSpaces|Review]] 
-  * [[http://​csc.media.mit.edu/​docs/​divisi2|Divisi]] (semantic networks, tensors & SVD in Python) 
-    * [[rewDivisi2|Review]] 
-  * [[miscellaneous|Miscellaneous]]