This is an old revision of the document!
Table of Contents
STATE
Is is working in my computer. I will write a script and run it on the whole American National Corpus and test results, memory and the type of vector produced! I talked with the Adms. and in the next few days the package should be running in the server.
General
- Develop in UCLA,
- Set of Java libraries,
- It is not finished; it is not dead code, though.
- There is a rich documentation regarding the algorithms and the implementation.
- Since it is a collection of algorithms, it is necessary to decide which ones are necessary!
- "The focus of this framework is to ease the development of new algorithms and the comparison against existing models." (Jurgens, Stevens).
- "Each word space algorithms is designed to run as a stand alone program and also to be used as a library class." (Jurgens, Stevens).
- The library supports word-document vectors.
- The authors affirm that it can collect more that a context-vector for a single word depending on the semantic meaning (e.g. bank as institution and bank as "Sitztgelegenheit" )
- "Libraries provide support for converting between multiple matrix formats, enabling interaction with external matrix-based program".
- SVD and randomized projections.
- From the pictures, scalability of most of the algorithms seems to grow with a linear factor!
- The package is constituted by four type of tools:
- A library (implementation) of commonly used algorithms in semantic spaces.
- Tools for building semantic models
- Evaluation tools (e.g. TOEFL test for synonyms).
- Interaction tools (e.g. queries, etc.).
Installation
- Required Software
- svn (Subversion). Can be installed with a apt-get command:
sudo apt-get install subversion
- To installed the package go to a target directory. The authors recommends to use the following command:
svn checkout http://airhead-research.googlecode.com/svn/trunk/sspace sspace-read-only
- A new directory should have been created. Go to the directory and use the command
ant
. Ant is part of the Apache project and is used to build java libraries. It will automatically detect the file build.html and install from it. I explained here how to install ant.
Technical Issues
Testing
- The S-Space package supports reading and writing several matrix file formats. Among those supported are
- SVDLIBC text, sparse text, binary and sparse binary
- Matlab and Octave dense text and sparse text formats
- CLUTO sparse text
- The package provide an user interface, i.e., a class to used S-package from the terminal.
- The package provide utilities to process 'raw text', meaning this that these utilities presuppose corpus pre-processing! The user might select how the text files are structures, i.e., a single string data file, as files, etc. Check here for a short tutorial.
— Eduardo Aponte 2010/11/16 10:38