Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:rewsspacepackage [2010/12/07 12:33]
eapontep [Testing]
software:rewsspacepackage [2010/12/07 12:53] (current)
eapontep
Line 1: Line 1:
-==== STATE ==== 
-It is working in my computer. I will write a script and run it on the whole American National Corpus and test results, memory and the type of vector produced! I talked with the Adms. and in the next few days the package should be running on the server. 
- 
- 
 ==== General ====  ==== General ==== 
  
Line 29: Line 25:
   * A new directory should have been created. Go to the directory and use the command<file bash>ant</file>. Ant is part of the Apache project and is used to build java libraries. It will automatically detect the file build.html and install from it. I explained [[rewSemVector|here]] how to install ant.   * A new directory should have been created. Go to the directory and use the command<file bash>ant</file>. Ant is part of the Apache project and is used to build java libraries. It will automatically detect the file build.html and install from it. I explained [[rewSemVector|here]] how to install ant.
   * If you want to make direct use of the .jar, you would also like to use the command   * If you want to make direct use of the .jar, you would also like to use the command
- 
-==== Technical Issues ==== 
- 
  
 ==== Testing ==== ==== Testing ====
Line 76: Line 69:
  
  
-{{:software:stats.png|Statistics 1.}}+{{:software:stats.png|Statistics 1}} 
 + 
 +Visual inspection suggest that problems regarding the density of the vector space are solved by using MATLAB as the defauld algorithm. 
 + 
 +{{:software:statmatlab.png|}} 
 + 
 +Finally, I compared the scalability of Random Indexing and LSA (using SVDLIBC with 100 dimension): 
 + 
 +{{:software:stats3.png|}}
  
 +It is clear that LSA can hardly handle large corpora. Although the results are different in the case of Random Indexing, they suggest a similar conclusion.
  
 +I wrote a simple script that automatically document the results of every experiment. It can be found under the key name "myScript.sh" in the corpora directory. The results are documented in the directory statistics. A python script automatically generates a graphviz representation in the directory vizImages. Since it is intended to be used with twopi, it has this very extension.