Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:start [2010/10/26 14:30]
schtepf
software:start [2018/08/06 12:06] (current)
schtepf [Off-the-shelf software packages for DSM]
Line 1: Line 1:
 ====== DSM Software and Data Sets ====== ====== DSM Software and Data Sets ======
  
-{{:icon_warn.png?32 |Under Construction}}+{{:under_construction.png?48 |Under Construction}}
  
 +\\
 **This page is under construction.**   **This page is under construction.**  
 \\ \\
 \\ \\
  
-===== Off-the-shelf packages for DSM ===== 
  
-  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] +===== Off-the-shelf software packages for DSM ===== 
-  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer + 
-  * [[http://code.google.com/p/semanticvectors|Semantic Vectors]] +**Python** 
-  * [[http://senseclusters.sourceforge.net/|SenseClusters]] +  * [[https://radimrehurek.com/gensim/|Gensim]] – high-performance topic modelling 
-  * [[http://code.google.com/p/airhead-research/|S-Space Package]] (work in progress+  * [[http://vecto.space/|Vecto]] – a new framework for count & predict models 
-  * [[http://code.google.com/p/wordspaces/|Wordspaces]] (interactive exploration) +  * [[http://clic.cimec.unitn.it/composes/toolkit/|DISSECT]] – easy-to-use package developed by the COMPOSES project 
-  [[http://divisi.media.mit.edu/|Divisi]] (semantic networs, tensors & SVD in Python)+  * [[https://pypi.org/project/Divisi/|Divisi]] – semantic networks, tensors & SVD ([[rewDivisi2|review]]) 
 + 
 +**R** 
 +  * [[http://wordspace.r-forge.r-project.org|wordspace]] – user-friendly DSM exploration 
 + 
 +**Java** 
 +  * [[https://github.com/semanticvectors/semanticvectors/wiki|Semantic Vectors]] – scalable implementation based on random indexing ([[rewSemVector|review]]) 
 +  * [[https://github.com/fozziethebeat/S-Space|S-Space]] package ([[rewSSpacePackage|review]]) 
 +  * [[http://maggie.lt.informatik.tu-darmstadt.de/jobimtext|JoBimText]] – with support for distributed processing 
 + 
 +**C/C++** 
 +  * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] – classical LSA-style DSM ([[rewInfoMap|review]]) 
 +  * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer ([[hiDex|review]]) 
 +  * [[https://github.com/facebookresearch/fastText|FastText]] – state-of-the-art neural word embeddings 
 + 
 +**Other** 
 +  * [[http://senseclusters.sourceforge.net/|SenseClusters]] – distributional clustering in Perl 
 +  * [[http://scgroup20.ceid.upatras.gr:8000/tmg/|Text to Matrix Generator]] (TMG– text mining with NMF in Matlab 
 + 
 + 
 +//If you know other useful off-the-shelf packages missing from this list, please [[stefan.evert@fau.de|drop me a line]].// 
 + 
 +===== Precompiled DSMs ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Evaluation tasks ===== 
 + 
 +FIXME 
 + 
 + 
 +===== Useful corpora ===== 
 + 
 +FIXME 
 + 
 +  * The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleanedWikipedia Corpus]] from an April 2010 dump.  The WaCky initiative offers [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump.  Both corpora only cover the //English Wikipedia//.