The Similarity Library is written in Java and is built upon three main layers:
- Data Layer
- Wrapper Layer
- Similarity Layer
The Data Layer handle with the data to be exploited in the similarity estimation. Currently three ontologies are supported, that is, WordNet, the Gene Ontology and MeSH. In this layer also data obtained by querying search engine indexes appear.
The Wrapping Layer is responsible for wrapping data in a format that can be directly accessed for computing similarity. In the case of ontologies, data are indexed through the Lucene search engine library to speed up the lookup process. In particular, for each ontology concept, a corresponding Lucene document including information about the concept itself and the kinds of relations with its neighbors.
As for search engine data, a wrapper is used, which has the aim of collecting hits counts when posing a query to a search engine. Finally, the Similarity Layer has the aim of providing the similarity score given two words or sentences after choosing a similarity measure.