Integral Search Quality
‘Update’ refers to the process of search results renewal. When the results are updated, some sites may make it to the top 10, some other sites may "sink". Every search engine has its own update style which becomes clear in this analyzer. Every day the search engine update analyzer monitors the top ten responses to 140 queries in order to assess the number of sites that changed their positions, and how much the positions have changed.
Let Di be the change in position for the page that appeared i-th in top 10 search results on day 1. For example, if the fifth page from the first day top 10 appeared third or seventh on the second day, D5=2. If the second day top 10 did not contain a certain page which was present on the first day, then we will assume that Di=10 for that page.
The update indicator is calculated using the formula:
Consider a couple of examples:
On Day 1, a certain query has the following Top 10:
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10.
On Day 2, the same query has this Top 10:
Cn, C1, C2, C3, C4, C5, C6, C7, C8, C9.
In this case the update indicator is calculated as follows:
((2-1)+(3-2)+(4-3)+(10-9)+10)/100 = 0.19 (19%)
For Day 1, a certain query has the following Top 10:
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10.
For Day 2, the same query has this Top 10:
Cn1, Cn2, Cn3, Cn4, Cn5, Cn6, Cn7, Cn8, Cn9, Cn10.
In this case the update indicator equals:
10*10/100 = 1.00 (100%)
The analyzer also calculates some additional parameters: the number of sites that have disappeared from the search results and the number of sites that have changed their positions.
This analyzer has no valuation. The results can be interpreted in two ways: a search engine that has frequent large updates could be considered more up-to-date; a search engine with rare updates can be considered more stable and predictable. The informer of this analyzer sorts the search engines in the ascending order of update level.
Navigational Search Analyzers
The analyzers in this group make an estimation of the search engine's navigational functioning. Different kinds of queries are used to check, whether or not the site / page in question is found on the first result page.
Navigational queries are those looking for a specific site, file or page. Such queries will usually consist of the name of some organization or business (e.g., "Punjab and Sind Bank" or "Moores Glassworks"), of some print source or web site (e.g., "Cooking Light" or "bash.org"), or they will just name the page needed (like "Bofinger Rue Sherbrooke Ouest Montréal"). Likewise, eminent bloggers or official site owners often become a target of navigational queries (think of Art Garfunkel or Jessica Gottlieb).
Evidently, a navigational query can have more than one meaning: user searching for "alabama state university" or "avril lavigne" might look for an independent information about the organization or the person in question. Still, the official site must be present in the SERP, and its position must be high enough. Furthermore, the analyzers of this group allow the switch from stricter (the official site takes first or close to first position) to looser (it's enough that the official site is among the top ten results) examination criteria.
Analyzer of navigational search
A search query looking for a specific website is called a navigational query. For instance, such queries as "american express", "vogue", "fox news", "amazon", etc. are termed navigational queries.
The best result for a navigational query is the required site in the first position of search results.
For evaluation of navigational search, the search engines were tested with 200 queries randomly selected from the array of navigational queries. Each query is assigned one or more sites/markers. The top 10 search results are checked for the site /marker entries. When several sites /markers are assigned to a query, each of them listed in one of the top positions is considered a correct answer. Then we calculate the percentage of queries with correct answers found on the SERP. This number is the aggregate indicator of the quality of navigational search.
The best search engine is the one with highest aggregate indicator for this analyzer. In the informer, the search engines are sorted by the aggregate indicator.
Information Search Analyzers
The largest and the least defined group of queries are those aiming at finding information, in a broad sense of the word. Although an exhaustive survey of all such queries would seem impossible, yet some aspects of informational search come under close scrutiny in this group of analyzers.
Our analyzers cover Quote Search (Quotations and Originals) and Answer Search. It is very important that the search engine is able to (and is willing to) distinguish the original information source from its copies or imitations. This is the issue of the Originals' analyzer.
Our plans include broadening the scope of the search aspects under investigation.
Analyzer of the Location Search Quality
One of the most frequent and most obvious uses of search engines is for mere geographical navigation. We often want to find out whether a certain organization, business, service etc. is located where it should be, from our point of view. Although the overall search quality for this type of queries is not bad at all, mistakes sometimes occur. E.g., the search engine promptly finds the entity in question in some other city district or even some other city. Or, conversely, it supplies us with similar addresses of other organizations, presuming it doesn't make any difference what we shall do, provided we do it in the right place. This analyzer was made to compare the respective merits of search engines in dealing with such queries.
As in real life, our input queries consist of the organization's name plus the approximate locality, like a city district, a street or a nearby underground station. To make the results of the evaluation more precise, we only use such queries where the object of search is just one single entity. The ideal output in such case is a helper with full list of contacts. But at this stage we judged that just the presence of the correct address in the upper snippet will suffice for the SE to get the maximum grade. On the other hand, the results containing some other useful information, but not the address, won't get any.
The results are calculated on the same principle, as in the Navigational Analyzer: the higher we can find the snippet with the correct address, the more points gets the search engine (the first snippet gets 1.0, the 10th - 0.1 point). In addition, we calculate the ratio of correct answers found outside the snippets.
This analyzer deals with the queries containing short popular citations. Often coming from some literary source, these citations are massively used in everyday speech, and this obviously complicates your quest, if you want to know where does a specific catch phrase come from. Whether the original is a vastly popular work of fiction, or some obliterated philosophical treatise, the chances are high that you will only stumble upon multiple examples of the citation used as such, without any reference to its source. For example, 'what's in a rose' is likely to yield all possible results, from the site of Sweet Briar College to Ethnobotanical Society, before Shakespeare comes into view.
The analyzer examines 100 queries that consist of a popular citation, the source of which is known. For each search engine we calculate the percentage of the search results containing at least one of the following: a. the given fragment (or one of several fragments) of the text where the citation comes from or b. the name of the author and the title of the text. The positions of the pages among the search results are not taken into consideration.
Analyzer of the ranking of original texts
Unfortunately, the copyrighted content is illegaly copied all too often on the web. Any author faces the fact that his texts are subject to theft: the text of a newly created article can be copied on some web page within a few days or even a few hours after the article has been published. The websites stealing the content usually claim that it was "taken from open sources" or "uploaded by one of the users". Stolen content can be used to attract search engine users to a web page and the resulting traffic can be converted to money. This is the main reason for such 'borrowing'. The ability to identify the original texts and rank the corresponding web pages higher than the pages containing copied materials is a crucial property of any search engine.
The analyzer of the ranking of original texts uses exact quotation queries to daily monitor the position of 100 marker articles. For these articles, the web sites of copyright holders are known. The analyzer can thus calculate the percentage of queries for which the original text is ranked higher than the copied material.
The queries in this analyzer are fragments from the original article. By default the queries are submitted to the search engines in quotation marks. This way we expect that the only responses found will be the original article and its copies. However the real users rarely rely on quotation marks, so an additional tab estimates the results based on queries submitted without quotation marks.
The search engines are sorted by their ability to rank original texts higher than the copies in the informer of this analyzer.
Analyzer of Question Answering
This analyzer shows to what extent a search engine is able to answer queries asked in question format. These may include questions proper containing a question word ([what order is the aardvark in] [where is Lhasa located]) as well as other queries that imply a short and mostly uniform answer ([currency of Brasil] [nitric acid formula]).
The user's intent in such cases is to find such an answer. The higher its position in the search results, the better for the user. Ideally, it would appear in the first snippet or even higher, in the special result box.
This analyzer has four tabs. In each of them the total score of a search engine is the sum of its scores for each query.
1. Answer position in top 10 snippets
Here a search engine gets a score from 0 to 1 for each query, reflecting the highest position of a snippet, that contained the answer, in top 10 results. For example, if the answer was found in the first snippet, the search engine gets 1 for the query. If it was found in the second place, the score is 0.9 etc. If the answer was not found in the snippets of top 10 results, the score is 0.
2. Answer presence in snippets
Here a search engine gets 1 for a query if at least one of the top 10 snippets contained an answer to the question. 0 is assigned otherwise.
3. Position of answer web pages in top 10
A search engine gets a score from 0 to 1 for each query, reflecting the highest position of a web page, that contained the answer, in top 10 results. The score is 1 if the first page found contained an answer, 0.9 if the answer was present on the second page and 0 if none of the web pages found contained the answer.
4. Answer presence on web pages found
Here a search engine gets 1 for a query if at least one of the top 10 web pages contained an answer to the question. 0 is assigned otherwise.
For some queries the correct answer may have a few variants. For example, the query 'Olympic motto' is adequately answered whether the motto comes in Latin, in English or in the native language of the user. As long as these are variants of the same entity, we count them as correct. But we avoided the queries with multiple possible answers (like 'who was the wife of Henry the Eighth') as improper for this analyzer.
The search engine might work as good as it can, there still are these small annoying things that can easily damp the user's good spirits and significantly shatter his loyalty to a specific SE. Here belongs, e.g., the danger of contracting a computer virus, the presence of irritating and obtrusive ads etc.
Whereas the amount of advertisements or dangerous scripts on websites is not for the browsers to control, the concentration of objectionable content in the output is totally in their power. So, if the websites in the output abound with annoying factors, the search engine would only benefit from ranking such sites far below high-quality safe ones.
We here make use of specific techniques developed by "Ashmanov & Partners" for detecting various types of spam, like doorways, hacked sites or parasite hosting shops. To make the results more lucid, we sought out the markers, so that the probability of undesirable results was higher than usual.
Analyzer of search spam level
At "Ashmanov and Partners" we study the phenomenon of search spam – the methods and technologies reducing the quality of search results and interfering with the operation of search engines.
Search spam is a text, URL, technology, program code or other web elements created by the web-master for the sole purpose of promoting the site in search engines' results, and not for a fast and reliable search based on complete and authentic information.
The experts check Top 10 results of search queries on a regular basis, marking the sites which, in their opinion, contain elements of search spam. The collected data is entered into the informer. It shows the percentage of sites marked as spam in the overall number of sites that appeared in Top 10 of analyzed queries.
The source of information on the spam status of a given URL is the data of the anti-spam lab of the company "Ashmanov and Partners". The following categories of search spam are used:
* doorway – definite spam: doorways, leading the user to other pages,
* spamcatalog – definite spam: spammer catalogues,
* spamcontent – definite spam: spammers' stolen content,
* pseudosite – definite spam: site disguised as corporate (pseudo-company),
* catalog – catalogues,
* board – bulletin boards,
* domainsale – domains for sale,
* secondary – secondary, stolen content,
* partner – any partner programs,
* linksite – link support site,
* spamforum – forum containing spam,
* techspam – technical spam,
* searchres – search results
* cj – circular jerk
An aggregate indicator is the share of spam sites in the search results. The best search engine has the lowest indicator. This determines the order of search engines in the informer of the analyzer.
Additional Search Features
The object of the user's immediate interest is the set of answers he receives to his query. Nevertheless, his overall impression may as well be affected by smaller technical factors, not immediately discernible on the SERP.
These analyzers are meant to compare some additional parameters of the browsers' functioning that may be of considerable interest for the user. It's worth noting that comparative values bear more importance here than the absolute numbers.
Response time analyzer
The analyzer evaluates the average time it takes for the search engine result page to be uploaded on our server. As this time depends on the size of the pages, the tab "Size" provides this information for each search engine and each query.
The set of queries for this analyzer is the same as used in the Navigational analyzer
, the load time is calculated for each query and then the average number is shown as the result. The load time distribution graphs are also provided on each day's results page.