### General

SemanticVectors: Still in development and maintenance. It seems to be a nice possibility. However there are several problems with the compatibility of Lucene and SV.

• Written in Java
• Uses a Random Projection Algorithm
• Doesn't use SVD!!
• You NEED Lucene3.X (a Java library to performe text-search)

### Installation Instruction

Please read the very last point before starting this tutorial.

This is an installation tutorial for dolts like me, that means, I wrote it after fighting for hours with my lack of intelligence. I hope it helps you to install SemanticVectors quickly. You might need several things before installing SV, in particular a java developer kit, a building tool (ant) and Lucene, a library to search text. Everything is documented in this tutorial.

##### Install Lucene:
1. You will need JDK and ANT:
1. JDK: Development kit for java
2. Ant: A build-tool for java. Similar to make.
2. You can download both from a repository:
sudo apt-get install openjdk-6-jdk
sudo apt-get install ant
sudo apt-get install ant-doc
3. You can test your installation with the the following command on your terminal:
ant -version

You should get something like

Apache Ant version 1.7.1 compiled on September 8 2010
4. Untar the gz file in your desired location.
5. Go to the target directory.
6. If every thing is allright (if ant is working properly), then run ant in the current directory; it would automatically detect build.xml.
ant
1. You will need to set the appropiate CLASSPATH. CLASSPATH specify the location of Java libraries. To check up if the variable is declared in your system:
 echo \$CLASSPATH
2. Althought there are (supposedly) several methods to set CLASSPATH the only one which worked for me was the following: You will have to edit bashrc
gedit .bashrc
3. At the end of the document add:
export {location of lucene}/lucene-3.0.2/lucene-demos-3.0.2.jar:{location of lucene}/lucene-3.0.2/lucene-core-3.0.2.jar"

in my case:

 export CLASSPATH="/home/eduardo/programas/lucene/lucene-3.0.2/lucene-demos-3.0.2.jar:/home/eduardo/programas/lucene/lucene-3.0.2/lucene-core-3.0.2.jar"
4. Reboot your computer. Now Lucene should be working.
5. In order to check if everything is all right you should do the following.
1. Untar the corpus in a desired location. Normally the directory will we: bible_chapters. Go to that directory and run the following command:
java org.apache.lucene.demo.IndexFiles {complete bible_chapters path}
2. In my case:
java org.apache.lucene.demo.IndexFiles /home/eduardo/programas/SemanticVectors/bible_chapters
3. If everything runs ok, you should be done. An index directory will be created in bible_chapters. You should be able to perform some simple test using the demo library included in lucene.
##### Install SemanticVector
• This is the binary installation method. You could actually built SemanticVector by downloading the appropriate file and using again ant. I don't recommend it.
1. Move this file to the desire location
2. Again open bashrc and add to CLASSPATH the desire file. In my case it looks like:
export CLASSPATH="/home/eduardo/programas/lucene/lucene-3.0.2/lucene-demos-3.0.2.jar:/home/eduardo/programas/lucene/lucene-3.0.2/lucene-core-3.0.2.jar:/home/eduardo/programas/SemanticVectors/semanticvectors-1.8.jar"
3. Reboot your Computer.
4. By now every thing should be working. Go to the directory where you ran lucene. Run the following command:
java pitt.search.semanticvectors.BuildIndex {location in your computer}/bible_chapters/index/
5. Now you are ready to use SemanticVectors. In this point I realized that there is a (probably) very serious bug. Since lucene is constantly actualized and SV depends on Lucene, there are several compatibility issues between both. In particular, a class form Lucene has been deprecated in the last version. I checked in the official community and there are not answers to this issue, although others have reported the same problem. Maybe a I get some feedback regarding this problem. Otherwise an older version of the software will be necessary.

Eduardo Aponte 2010/10/31 13:49