This is the Semano Manual! It is currently a work in progress.
To start using Semano now, you can read the Quickstart guide to Semano at the following link:
http://www.minds.may.ie/~provost/semano/quickstart.html
You can download Semano in zip file format (or as tar.gz) using one of the links below:
zip file (for windows): http://www.minds.may.ie/~provost/semano/downloads/Semano.zip
tar.gz file (for linux or mac): http://www.minds.may.ie/~provost/semano/downloads/Semano.tar.gz
Once you have Semano downloaded, extract the compessed file. The root directory should have the following in it:
With some versions of java, you can just double-click Semano.jar to start the program.
If not, you should open a comand prompt (or shell) and cd to the extracted Semano directory.
From there, you should run the following command: "java -jar Semano.jar"
You can add any XML file. It will be taken apart into a list of individual words and then metadata will be generated for it. XML files that have been tested have been created mainly using blog RSS feeds saved as .xml extension files.
To add an XML file, click the "add Document" button. Browse to the folder containing the file the you wish to add, then select the file and click "open".
An Ontology is a file which describes concepts so that computers can understand them. Semano uses
OWL Ontologies. You can read a guide to it at following link:
http://www.w3.org/TR/owl-guide/
You can add any OWL ontology. XML files will be compared to the Ontologies to see if there is any
links between the terms in the XML files and the concepts in the Ontologies.
To add an Ontology file, click the "add Ontology" button. Browse to the folder containing the file the you wish to add, then select the file and click "open".
Options has four sub menues; Wordnet, Information Retrieval, Ontologies and Thresholds.
There are two Wordnet options.
One - You can select whether or not to use Wordnet ("Use Wordnet" option)
Two - You can select the JWNL Config file you wish to use. (Currently this option
does not work).
There are two Information Retrieval options.
One - The "Remove stopwords" option means that common terms such as "the" and "they" will be ignored
while the program makes a list of the terms in the source XML file(s).
Two - The "Stem the word list" option means that the program will attempt to make the terms found in
the XML file original by removing endings such as "ing" and "s" (jokes becomes joke etc).
There is one option in the Ontologies sub menu. "Ignore Commonly Imported Ontologies" means that the
program will not check the XML file against commonly referenced Ontologies such as the RDF or OWL Ontologies.
Thresholds help to narrow down the amount of relevant terms in the XML files which you think should be
checked against the concepts of the Ontologies. You can specify word counts to ignore by amount or percentage.
The default action of Semano is to ignore the top 5 terms and to take only the next 20. Semano will try to
act smart if there are less than 25 terms in the document.
The Thresholds sub menu has 5 options.
One - Use Default (Top Twenty Words) - This is the default action of Semano to ignore the top 5 terms and to take only the next 20.
Two - Use Amount of Words Threshold - This action tells Semano to ignore a user specified amount of terms at the start and also a
user specified amount of terms at the end of the XML source files.
Three - Use Percent of Words Threshold - This action tells Semano to ignore a user specified percent of terms at the start and also a
user specified percent of terms at the end of the XML source files.
Four - Set Thresholds for Amount of Words - This option sets the two Thresholds for the amount of words from the XML files to be ignored.
Five - Set Thresholds for Percent of Words - This option sets the two Thresholds for the Percent of words from the XML files to be ignored.
Once you have added your selection of XML files, the Ontologies you want to use and when
you have chosen the options which you feel will optimise the operation of Semano, you can
click the "start" button to get Semano to produce results for you.
Semano lists results in three columns. Each column has it's own heading to enable easier navigation
through the results. The three columns are as follows:
Source Files
This column lists the source XML files which have been checked. They are listed in order of completion by Semano.
The file with the gold border is currently selected. The Ontologies which it has been checked against will be listed in
the "Ontologies [Categories]" column.
Ontologies [Categories]
This column lists the Ontology files that were compared to the currently selected Source File. They are listed in order
of completion by Semano. The Ontology with the gold border is currently selected. The keywords which have been identified as
a match by Semano will be listed in the "Keywords" column. The amount of matching keywords is also displayed for each Ontology.
You can mark or unmark all the Ontology's keywords as matches using the "Use All Keywords" tickbox.
Keywords
This column lists the Keywords identified as a match between the currently selected Source File and the currently selected Ontology.
They are listed with matches on top in green and non matches on the bottom in red by Semano. You can mark or unmark a keyword as a
match using the "Use Keyword" tickbox.
Once you are happy with the results, you can save the information as metadata. Two files will be saved. One is an OWL file (FILENAME.FILE-EXTENSION.owl) which relates the source xml file to the ontologies which it has been checked against and a second custom XML (FILENAME_OWLDesc.xml) file.
To save the information, click on the "save" button above the results.
This is the End of the Semano Manual. Hopefully it was helpful!
You can return to the Semano website homepage by clicking the following link: http://www.minds.may.ie/~provost/semano/
Copyright © 2007 Paul Mara