Performance Evaluation of Keyword and Semantic based Search Engines-An Empirical Study

Tracking #: 1586-2798

Junaid Rashid
Syed Muhammad Adnan
Muhammad Wasif Nisar

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Survey Article
The Keyword search engines do not know the meanings of words, expressions with terms using in different web pages. Therefore they don't provide the relevant search results for the user queries. In this paper, we discuss the semantic searching, technologies used for semantic searching and compare some semantic search engines. After that, some traditional search engines (keyword based search engines) and semantic-based search engine performance are compared. Firstly, the three keyword search engines (Google, Yahoo, and AOL) and four semantic web based search engines (Bing, Duckduckgo, Sensebot, and Exalead) are taken. These keyword and semantic search engines are compared to check the performance of their searching, by the precision ratio, Mean, and the Geometric Mean. Forty queries on different topics were selected and execute on every search engine. The first twenty links of documents were retrieved and categorized as relevant and irrelevant. The precision ratio, mean and geometric was calculated for the first twenty documents to find out search engines performance. In this study, it was found that the relevant document retrieved by Bing is more (570 out of 800) than any of the other search engine. Sensebot whole performance regarding precision ratio is lowest (23.87 %). After Bing, the DuckDuckGo retrieves (515 documents out of the 800). The precision ratio of Bing is 71.25% which are greater than others search engines. After Bing, the DuckDuckGo precision ratio is 64.37%.The Google precision ratio is 59% which is a keyword based search engine.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By David Corsar submitted on 03/Aug/2017
Review Comment:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

As an introduction to the state of the art in research using semantic web technologies to support searching on the web, this paper offers little to those already familiar with semantic web technologies. The general readability and clarify of the text is needs improving, as the paper is currently difficult to read and, in places, difficult to understand the information being presented. Section 8 discusses the “technologies” used for semantic search engine, although these are limited the standard semantic technologies (RDF, XML, URI, Ontology) that individuals working in the semantic web area should be familiar with and there are other, dedicated introductions to these technologies elsewhere; the paper also does not provide a suitable introduction to the reasoning, NLP, and other algorithms utilised when using semantic web technologies to support web search; section 5 discusses four approaches used by semantic search engines, but the descriptions are short, unclear, and very high level, and so contribute little to helping the reader learn about the topic; section 6 discusses seven semantic search engines (at least three of which (Cognition, Hakia, Kosmix) have not been available for a number of years) again, the discussion is at a high level, and does not really help the reader understand how semantics are being used – although this list may provide search engines for the reader to lookup themselves for further information. As an introduction to evaluating the performance of search engines (semantic or not), again the study is discussed at a high level, without sufficient discussion of experimental design decisions and methodology.

(2) How comprehensive and how balanced is the presentation and coverage.

The authors fail to sufficiently justify the relevance of this paper as a survey paper and why it is relevant to this special issue topic of benchmarking linked data. Survey papers should “survey the state of the art of the topic” which in the case of this paper would be semantic search, yet the main content of this paper focuses on comparing the results of three keyword based search engines with three with four semantic based search engines; section two briefly discusses previous works evaluating search engine performance; section 6 briefly discusses seven semantic search engines, but at a high level with no real details of how semantics are used, and only covers two of the four semantic search engines that are evaluated later in the paper. Some of the search engines discussed in section six are no longer available (e.g. Hakia hasn’t been available since 2014, Kosmix since 2011, cognition is no longer available), and Swoogle supports search for semantic web resources not the general web. In terms of suitability for this special issue, I consider the most relevant topic in the call for papers to be “search, browsing, and query answering”, however it is unclear how this paper contributes to benchmarking search of linked or using linked data. Some semantic search engines are not included, for example Watson, Falcons, Kngine semantic search engine, Cluuz.

Considering the title of this paper is “an empirical study” few details are provided in section 10 about the study; there are insufficient details for another researcher to repeat the experiment. Several decisions were made in the design of the experiment that are not explained: why were only 20 results from search engines considered; how did the authors compensate for personalisation of results performed by search engines based on the person performing the search, the country from which the request originates, etc.; what criteria were used to decide if a retrieved document was indeed relevant; how many people evaluated each document; did they do this independently or not; what were the actual queries sent to the search engines; what were the desired/intended meaning of the search terms?

For example, query 33 “Apple” - if a search engine returned results about the Apple company and the fruit were these all considered to be relevant to the term? Where new articles or stock prices for the Apple company relevant?

The paper adopts the stance that semantic search engines are better than keyword based search engines, and does not discuss related work done by “traditional” search engines on incorporating structured data into the search algorithms of Google (via their knowledge graph), Yahoo (who have been publishing on searching RDF since ISWC in 2011), and AOL which is now powered by Bing - a search engine the authors include in the evaluation as a semantic search engines – having previously been powered by Google (I mention these three as they are the three traditional (or “keyword”) search engines used in the comparative evaulation). In this regard, the paper does not clearly explain how the authors differentiate keyword and semantic based search engines – this issue weakens the study presented in section 10, as Table 1 (and the preceding text) describes seven search engines the authors consider as semantic search engines, yet this list does not include Exalead and Bing which are used in the evaluation as semantic search engines.

(3) Readability and clarity of the presentation.

Overall, the paper is poorly written and requires significant rewriting to ensure the arguments and information are presented clearly, unambiguously and without informal language which is used throughout. For example, the first paragraph should be re-written from

“The web search engines are computer program that permits the user to search on web and retrieve web documents with different queries for their needs of information. The huge amount of information is available on the web which may be in the form of unstructured, semi structure and structure. So, it is very difficult to find out the relevant contents of documents on the web”

To something along the lines of

“Web search engines are computer programs that permit users to search the web and retrieve documents from the web using queries based on their information needs. With estimates of over 4.75 billion web pages (according to, searching the available unstructured, semi-structured, and structured information available on the web is a difficult task.”

Similar reworking is required for the majority of the paper.

Other issues of note here:

Table 3 label is inconsistent with the reference to the table in the text (which states the table shows the number of the first 20 documents returned by the search engine that were rated relevant to the search. This information is also poorly depicted in Figures 5, 6, and 7.

Figs 5 – 8 are illegible and lack titles and labels on each axis.

Figs 8 and 9 are illegible; it is unclear what information this graph is attempting to present.

Fig 10 is titled “average percentage” yet the label states it is the “precision ratio”

Figs 11 and 12 both show mean values using visualisation methods that are not suitable to this type of data – inclusion of the data in Table 3 is sufficient.

Figs. 13 provides a radar representation of the geometric means values, again this is not a suitable visualisation - this same information is more suitably presented in the Fig 14 bar chart.

Fig 15 does not present a liner regression as labelled, rather it simply plots data points; however, given the data that is being presented it is unclear why the authors use an linear regression line as a suitable visualisation.

The final paragraph of the results discussion mentions “Hakia topped regarding NL query processing”; however Hakia was not included in the evaluation so it is unclear where the justification for this statement comes from.

The authors also make several statements throughout the paper, without any supporting evidence; for example, page 1 paragraph 5: “takes a long time to find the exact data from different links” – how long is “long time”, which study has found this?; also in the same paragraph “new generations prefer semantic search engines” – new generations of what: people, search engine developers? Where is the survey that found this? Page 2 paragraph 4 “the world progress increasing and the need of information is also increasing. The demand for such need increases as the computer or IT world makes the development” – again which study illustrates this? Page 2, paragraph 6 “the issues which are arising during the development of the web were:” – these are standard issues search engines deal with so should have justification in the form of references; page 2 paragraph 13 “Google popularity is accepted” – what does this mean, and who is accepting the popularity of Google? Page 3, paragraph 7 “keyword based search engines are very helpful for finding the information from the web and smarter with time but they do not know the meaning of words” – please define what “smarter” means in this context and how their “smartness” has been shown to improve over time. Page 3, paragraph 9 is a quote without a reference; paragraph 10 mentions “Guha et al” without a reference; “answer it quite exactly” – again, what does this mean? “apple. The outcome reveals degree basic word apple can represent fruit, bark, Microsoft Corporation” – the relationships between Apple and Microsoft Corporation is unclear regarding having “equivalent changes” of being returned as details of fruit or bark do.

References are inconsistently and often incorrectly formatted with numerous incorrect uses of case (e.g. in [4] “dieter Fensel”, [10] “Mika, p.: ontologies are us”, [19] the title is all uppercase and in brackets; mixture of full author names, initials, and “et al”; [7] repeats half of the book title; [23], [26-28], [32],[39],[43] do not include publication venue; [59] is an illegible web link

(4) Importance of the covered material to the broader Semantic Web community.

The timeliness of this work is unclear – the authors do not state when the study was conducted; Fig 4 screenshot of DuckDuckGo is of the previous design which was replaced approximately three years ago (based on when it was redesigned according to The semantic search engines discussed in the paper aren’t state of the art - with some having not been available for years – and disconnected from the presented study. The description of the study is insufficient for replication, and, as its currently unclear how the “correctness” of the pages returned by search engines was determined – I would have expected multiple researchers assessing each result independently, then discussions to resolve any disagreements – the methodology is insufficiently described for readers to be certain of the results. This paper does not offer significant new knowledge and relevant contributions to those already working in the area of utilising semantic web technologies to support web search, and, more generally, offers little to the broader Semantic Web community.

Review #2
Anonymous submitted on 13/Aug/2017
Review Comment:

The paper compares different semantic search engines based on a number of criteria and then compares different keyword based search engines against a selection of semantic search engines based on experimental results. The paper therefore can be seen as a mixture of a survey paper and an experimental result. Unfortunately, the paper fails to be of sufficient quality in any of the two categories.

The key argument of the paper is that semantic search engines perform better than keyword based search engines. The conducted experiment has many open issues, in particular the experimental design is rather simplistic. The rational for the selection of queries (see table 2) is not explained well nor does the approach take into account the fact that due to personalization and localization features the queries from different users at different times will results in different result sets.

The paper is classified as a survey paper, therefore the remainder of the review is written by taking into account the dimensions for survey papers. The paper is not suitable as an introductory text for PhD students etc., as the comparison of the search engines is rather shallow and the results of both comparisons (table 1 and 2) remain on the surface. The paper is neither written well nor is it clearly presented. It contains many spelling errors and grammatical errors. The quality of the figures is rather poor and the contents of the figures are hardly explained in the text. The importance of the covered material for the broader Semantic Web community can be seen as rather low.

Review #3
Anonymous submitted on 08/Oct/2017
Review Comment:

The paper aims to be a survey and comparison of semantic search engines and keyword-based search engines. By virtue of its intent, it has the potential of being suitable for the special issue. The authors provide their working hypothesis in the abstract by stating (if I understand them correctly, see typos and minor comments of the review) that classical search engines do not answer user queries in a satisfactory manner as they do not represent the meaning of the user queries. This statement must be restated. Previous versions of the Google search engines had no semantic backend and could still answer a large number of queries to the satisfaction of their users. The introduction aims to give a motivation for the need for semantic-web-search engines but is also rather over the top. Statements such as "It is impossible for the user to get relevant results from these lots of links" simply do not reflect the reality of using keyword-based search engines. On the other hand, statements such as "Semantic search engines cannot be used for navigational searches" are also incorrect. The related work is rather incomplete, a fatal flaw for a survey paper. The author do not expand upon the methodology they used for paper selection and disregard all modern development of semantic-web-driven keyword search engines. Works such as (Shekarpour et al., 2013) and survey papers such as (Höffner et al., 2016) are not taken into consideration. The results of the QALD series on benchmarking a.o. keyword search engines are also not considered. The authors gather a list of engines (see Figure 1) without really explaining why they were chosen (systematic search? based on performance?). The technologies used for semantic search are incorrect and incomplete (see Höffner et al., 2016). Table 1 aims to be a comparison of the features of search engines but there is again no motivation of how these features were chosen. The performance assessment in Section 10.2 is based on a self-created benchmark. While having new benchmarks is obviously appreciated, the way the benchmark was created is rather unclear. The first_20_precision should be called p@20. The relevance assessment is also unclear (what was the annotator agreement? who were the annotators? was it a controlled experiment? etc.). The relevance of the many results and views presented by the authors cannot be assessed as the evaluation methodology is rather unclear. It is also unclear why the authors chose to assess the tools they selected with a non-standard benchmark. Overall, the goal of the paper is very interesting and relevant. The paper is very poorly written and needs a complete rewrite. The methodology is not clearly explained and should be written down clearly. The authors should also think about using a standard benchmark and explain how they came about theirs. The evaluation setting must be made explicit. I'd suggest that the authors submit the paper to a relevant venue one these issues have been fixed. However, the paper cannot be accepted in its current state.

Typos and minor comments


* The Keyword search => Keyword search
* the meanings of words => the meaning of words
* words, expression with terms => unclear
* using in different pages => used in different pages
* Therefore they don't => Therefore, they do not
* the semantic searching => semantic search
* keyword based search engines => keyword-based search engines
* engine performance => engines' performance
* three keyword based => three keyword-based
* semantic web based => semantic-web-based
* their searching => their search
* Mean => mean
* Geometric Mean => geometric mean
* geometric was calculated => geometric mean were calculated

etc. I stopped correcting for typos here as there are too many. ...