Using LSA matrix comparison to improve the relevancy of search engine answers.
Nkukwana, S. & Weideman, M.
Proceedings of the 8th annual Conference on WWW Applications. 5-8 September. Bloemfontein, South Africa.
Nkukwana, S. & Weideman, M. 2006. Using LSA matrix comparison to improve the relevancy of search engine answers. Proceedings of the 8th annual Conference on WWW Applications. 5-8 September. Bloemfontein, South Africa. Online: http://web-visibility.co.za/website-visibility-digital-library-seo/
The principal objective of this research project is to analyse and apply the use of Latent Semantic Analysis (LSA) as a support mechanism for Internet searching. The research aim is to improve the standard of search engine results where accommodation in South Africa is the search key, using the Ananzi search engine. This paper contains a detailed literature survey and a proposed methodology to achieve this aim. LSA is a theory and a method for extracting and representing the contextual meaning of words by statistical computations applied to a large text section. It analyses word-word, word-passage, and passage-passage relationships. This makes it feasible to compare words by paragraphs, paragraphs by paragraphs, and paragraphs by documents for the relevancy of data. Most of the existing search engines base their information retrieval purely on the keyword search mechanism. This implies that results are retrieved based on the matching of these keywords, ignoring the meaning and the sense they make towards documents to be retrieved. The strength of the proposed design is the ability to use the keyword technique, concentrate on the meaning of words and the sense they make in the webpage document.
In order to test, analyse, and apply this technique and its ability, an implementation of a search tool based on LSA technology will be developed.
- Alimohammadi, D. 2003. Meta-tag: a means to control the process of Web indexing. Online Information Review, 27(4):238-242.
Ananzi, 2005. How does Ananzi work? [Online]. Available WWW: http://search1.ananzi.co.za/faq/works.html (Accessed 27 April 2006).
- Blackmon, M.B., Kitajima, M, and Polson, P.G. 2005. Tool for accurately predicting website navigation problems, non-problems, problem severity, and effectiveness of repairs. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Portland, Oregon, USA, 31-40.
- Cozahost, 2004. How search engines and their spiders work. [Online]. Available WWW: http://www.cozahost.com/info/sespiders.asp (Accessed 27 April 2006).
- Deerwester, S., Dumais, T., Landauer, K., Furnas, W. and Harshman, R. 1990. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391-407
- Garodia , R. 2005. Web Spiders. [Online]. Available WWW: http://www.allconferences.com/conferences/20050423181743/ (Accessed on 20 May 2006).
- Guandong, X., Yanchun, Z., Jiangang, M. and Xiaofang, Z. 2005. Discovering user access pattern based on probabilistic latent factor model. In: Proceedings of the sixteenth Australasian conference on Database technologies - Volume 39, Newcastle, Australia, 27 - 35.
- Henshaw, R., and Valauskas, E.J. 2001. Metadata as a catalyst: experiments with metadata and search engines. Libri, 51(2):86-101.
- Ishwinder, K, and Hornof, A.J. 2005. A comparison of LSA, wordNet and PMI-IR for predicting user click behaviour. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Portland, Oregon, USA, 51-60.
- Kintsch, W. 2001. Predication. Cognitive Science, 25:173-202.
- Landauer, T. K., Foltz, P. W., and Laham, D. 1998. Introduction to Latent Semantic
Analysis. Discourse Processes, 25:259-284.
- Maletic, J., and Valluri, N. 1999. Automatic Software Clustering via Latent Semantic Analysis. In: Proceedings of the 14th IEEE International Conference on Automated Software Engineering (ASE'99), October 12-15, 251-254.
- Osinski, S. 2004. Dimensionality Reduction Techniques for Search Results Clustering. [Online]. Available WWW: http://www.cs.put.poznan.pl/dweiss/carrot-bin/osinski04- dimensionality.pdf (Accessed 10 October 2005).
- Plaza, S. 2002. Multimedia Search Engines White Paper. [Online]. Available WWW: http://www.medialab.sonera.fi/workspace/MultimediaSearchEngines.pdf (Accessed 10 May 2006).
- Quesada, J.F, Kintsch, W. and Gomez, E. 2001. A computational theory of complex problem solving using the vector space model (part I): Latent Semantic Analysis, through the path of thousands of ants. In: Proceedings of the 2001 Cognitive research with Microworlds meeting, 117-131. J.J. Cañas (Ed.)
- Schunn, C. D. 1999. The presence and absence of category knowledge in LSA. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society. Mahwah, NJ: Erlbaum.
- Tamborello, F.P., Byrne, M.D. 2005. Information search: the intersection of visual and semantic space. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Portland, Oregon, USA, 1821-1824.
- Thomas, A., and Shearer, J. 2000. Internet searching and indexing. New York: The Harworth Information Press.
- Weideman, M., and Strumpfer, C. 2004. An empirical evaluation of one of the relationships between the user, search engines, metadata and websites in three-letter .com websites. Information Technology and Libraries, 23(2):58-65.
- Wiemer-Hastings, P. 1999. How latent is Latent Semantic Analysis? In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, August 1999. San Francisco: Morgan Kaufmann: 932-937