Weideman, M. 2010. Search engine query generation for effective retrieval of known academic publications. Proceedings of the 12th World Wide Web conference (ZAW3-10). 21-23 September. Durban, South Africa. Online:

Academics often need access to publications of other academic work for reference. Sometimes however, some detail of a specific reference is known, but a copy of the full document is required. The objective of this research project was to compare different methods of query generation to successfully retrieve academic documents, while assuming that the user has the basic document details. In a series of empirical experiments, 20 universities with digital libraries (not requiring logins) were identified. Five academic documents stored in each ones' digital library were further found and inspected. Searches were done, using three types of query for each one of these documents. Subsequently, rankings on search engine result pages were recorded. The current visibility of these documents was then calculated. After submission to Google, a waiting period was allowed for crawler visitation, and the searches and calculations repeated. The resultant data was used to measure the success of the three different types of queries over 300 searches. This was done both before and after manually submitting each document's URL to Google. Results indicate that using keywords from the document title produces the most efficient query, with much improvement after submission. Secondly, using a text sequence from the body produces the second-most efficient query, but with a small reduction in visibility. Finally, using author surnames produced a much less efficient query, although with slightly increased visibility. It was concluded that academic searchers should concentrate on using a concatenation of weight-carrying keywords from the title of a known academic document as search query for most efficient document retrieval.
