Both sides previous revision
Previous revision
Next revision
|
Previous revision
Last revision
Both sides next revision
|
software:start [2010/12/03 16:00] schtepf [Useful corpora] |
software:start [2018/07/26 12:00] schtepf |
\\ | \\ |
\\ | \\ |
| |
| |
| ===== Off-the-shelf software packages for DSM ===== |
| |
| **Python** |
| * [[https://radimrehurek.com/gensim/|Gensim]] – high-performance topic modelling |
| * [[http://vecto.space/|Vecto]] – a new framework for count & predict models |
| * [[http://clic.cimec.unitn.it/composes/toolkit/|DISSECT]] – easy-to-use package developed by the COMPOSES project |
| * [[https://pypi.org/project/Divisi/|Divisi]] – semantic networks, tensors & SVD ([[rewDivisi2|review]]) |
| |
| **R** |
| * [[http://wordspace.r-forge.r-project.org|wordspace]] – user-friendly DSM exploration |
| |
| **Java** |
| * [[https://github.com/semanticvectors/semanticvectors/wiki|Semantic Vectors]] – scalable implementation based on random indexing ([[rewSemVector|review]]) |
| * [[https://github.com/fozziethebeat/S-Space|S-Space]] package ([[rewSSpacePackage|review]]) |
| |
| **C/C++** |
| * [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] – classical LSA-style DSM ([[rewInfoMap|review]]) |
| * [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer ([[hiDex|review]]) |
| * [[https://github.com/facebookresearch/fastText|FastText]] – state-of-the-art neural word embeddings |
| |
| **Other** |
| * [[http://senseclusters.sourceforge.net/|SenseClusters]] – distributional clustering in Perl |
| * [[http://scgroup20.ceid.upatras.gr:8000/tmg/|Text to Matrix Generator]] (TMG) – text mining with NMF in Matlab |
| |
| |
| //If you know other useful off-the-shelf packages missing from this list, please [[stefan.evert@fau.de|drop me a line]].// |
| |
| ===== Precompiled DSMs ===== |
| |
| FIXME |
| |
| |
| ===== Evaluation tasks ===== |
| |
| FIXME |
| |
| |
===== Useful corpora ===== | ===== Useful corpora ===== |
| |
* The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump. The WaCky initiative offers [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump. Both corpora only cover the //English Wikipedia//. | FIXME |
| |
===== Off-the-shelf packages for DSM ===== | * The Westbury Lab at Alberta has a [[http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html|preprocessed (cleaned) Wikipedia Corpus]] from an April 2010 dump. The WaCky initiative offers [[http://wacky.sslmit.unibo.it/doku.php?id=corpora|WaCkypedia, a dependency-parsed Wikipedia Corpus]] from a 2009 dump. Both corpora only cover the //English Wikipedia//. |
| |
* [[GenSim]]: incremental SVD & LSA in python, easily deployable to clusters. | |
* [[http://infomap-nlp.sourceforge.net/|Infomap NLP]] | |
* [[rewInfoMap|Review]] | |
* [[http://www.psych.ualberta.ca/~westburylab/downloads/HiDEx.download.html|HiDEx]], the High-Dimensional Explorer | |
* [[hiDex|Review]] | |
* [[http://code.google.com/p/semanticvectors|SemanticVectors]] | |
* [[rewSemVector|Review]] | |
* [[http://senseclusters.sourceforge.net/|SenseClusters]] | |
* [[rewSenseClusters|Review]] | |
* [[http://code.google.com/p/airhead-research/|S-Space Package]] (work in progress) | |
*[[rewSSpacePackage|Review]] | |
* [[http://code.google.com/p/wordspaces/|Wordspaces]] (interactive exploration) | |
* [[rewWordSpaces|Review]] | |
* [[http://csc.media.mit.edu/docs/divisi2|Divisi]] (semantic networks, tensors & SVD in Python) | |
* [[rewDivisi2|Review]] | |
* [[miscellaneous|Miscellaneous]] | |