Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
software:rewsspacepackage [2010/11/21 13:04] eapontep [Testing] |
software:rewsspacepackage [2010/12/07 12:53] (current) eapontep |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== STATE ==== | ||
- | It is working in my computer. I will write a script and run it on the whole American National Corpus and test results, memory and the type of vector produced! I talked with the Adms. and in the next few days the package should be running on the server. | ||
- | |||
- | |||
==== General ==== | ==== General ==== | ||
Line 29: | Line 25: | ||
* A new directory should have been created. Go to the directory and use the command< | * A new directory should have been created. Go to the directory and use the command< | ||
* If you want to make direct use of the .jar, you would also like to use the command | * If you want to make direct use of the .jar, you would also like to use the command | ||
- | |||
- | ==== Technical Issues ==== | ||
- | |||
==== Testing ==== | ==== Testing ==== | ||
Line 55: | Line 48: | ||
Nov 21, 2010 12:44:10 PM edu.ucla.sspace.matrix.MatrixIO matlabToSvdlibcSparseBinary | Nov 21, 2010 12:44:10 PM edu.ucla.sspace.matrix.MatrixIO matlabToSvdlibcSparseBinary | ||
INFO: Converting from Matlab double values to SVDLIBC float values; possible loss of precision</ | INFO: Converting from Matlab double values to SVDLIBC float values; possible loss of precision</ | ||
- | | + | |
+ | |||
+ | === Trials with LSA === | ||
+ | |||
+ | I performed a number of trials with LSA. This trials where intended to prove the time and memory comsumed by different algorithms compatible with the LSA implementation. Available from the command lines are: | ||
+ | |||
+ | | ||
+ | * Matlab | ||
+ | * GNU Octave | ||
+ | * JAMA | ||
+ | * COLT | ||
+ | |||
+ | I didn't make a large review of the implementations and rather start proving every algorithm. As previous results showed, using the default algorithm (SVDLIBC) generated strange results (the angular distance between vectors) was extremely low among close neighbors. Two reason were identified | ||
+ | |||
+ | I performed a test with 30000 document and 200 dimensions with SVDLIBC, MATLAB, OCTAVE and COLT. The results were in part disappointing because, with the exception of MATLAB, all other algorithms ran out of memory (in particular, the pipeline between LSA and the SVD algorithm ran out of heap memory). | ||
+ | |||
+ | * SVDLIBC: Ran out of memory at 6459 seconds. | ||
+ | * Matlab: After 5083 seconds returned a 450Mb .sspace file | ||
+ | * GNU Octave: Ran out of memory at 7624 seconds | ||
+ | |||
+ | |||
+ | {{: | ||
+ | |||
+ | Visual inspection suggest that problems regarding the density of the vector space are solved by using MATLAB as the defauld algorithm. | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Finally, I compared the scalability of Random Indexing and LSA (using SVDLIBC with 100 dimension): | ||
+ | |||
+ | {{: | ||
+ | |||
+ | It is clear that LSA can hardly handle large corpora. Although the results are different | ||
+ | |||
+ | I wrote a simple script that automatically document the results of every experiment. It can be found under the key name " |