Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine
Abstract: Byte code as
information source is a novel approach which
enable Java archive
search engine to
be built without relying on another resources except
the Java archive itself. Unfortunately,
its effectiveness is
not considerably high
since some relevant documents
may not be
retrieved because of vocabulary
mismatch. In this
research, a vector
space model (VSM) is
extended with semantic
relatedness to overcome vocabulary mismatch
issue in Java
archive search engine. Aiming the
most effective retrieval
model, some sort
of equations in retrieval models
are also proposed and evaluated such as sum up all related
term, substituting non-existing term with
most related term,
logaritmic normalization, context-specific relatedness,
and low-rank query-related
retrieved documents. In general, semantic relatedness improves recall as
a tradeoff of
its precision reduction.
A scheme to
take the advantage of
relatedness without affected
by its disadvantage (VSM + considering
non-retrieved documents as
low-rank retrieved documents using
semantic relatedness) is
also proposed in this research. This scheme assures that relatedness score
should be ranked lower than standard exact-match score. This scheme
yields 1.754% higher
effectiveness than standard VSM used in previous research.
Author: Oscar Karnalim
Journal Code: jptinformatikagg150005