All Categories

Context Clusters in Search Query Suggestions

2/18/2019

Context Clusters and Query Suggestions at Google

A new patent application from Google tells us about how the search engine may use context to find query suggestions before a searcher has completed typing in a full query. After seeing this patent, Iâ€™ve been thinking about previous patents Iâ€™ve seen from Google that have similarities.

Itâ€™s not the first time Iâ€™ve written about a Google Patent involving query suggestions. Iâ€™ve written about a couple of other patents that were very informative, in the past:

6/10/2016 â€“ Google Entity Search Suggestions Patent (Associating an entity with a search query)
5/26/2010How a Search Engine Might Identify Possible Query Suggestions (Generating query suggestions using contextual information)

In both of those, the inclusion of entities in a query impacted the suggestions that were returned. This patent takes a slightly different approach, by also looking at context.

Context Clusters in Query Suggestions

Weâ€™ve been seeing the word Context spring up in Google patents recently. Context terms from knowledge bases appearing on pages that focus on the same query term with different meanings, and we have also seen pages that are about specific people using a disambiguation approach. While these were recent, I did blog about a paper in 2007, which talks about query context with an author from Yahoo. The paper was Using Query Contexts in Information Retrieval. The abstract from the paper provides a good glimpse into what it covers:

User query is an element that specifies an information need, but it is not the only one. Studies in literature have found many contextual factors that strongly influence the interpretation of a query. Recent studies have tried to consider the userâ€™s interests by creating a user profile. However, a single profile for a user may not be sufficient for a variety of queries of the user. In this study, we propose to use query-specific contexts instead of user-centric ones, including context around query and context within query. The former specifies the environment of a query such as the domain of interest, while the latter refers to context words within the query, which is particularly useful for the selection of relevant term relations. In this paper, both types of context are integrated in an IR model based on language modeling. Our experiments on several TREC collections show that each of the context factors brings significant improvements in retrieval effectiveness.

The Google patent doesnâ€™t take a user-based approach ether, but does look at some user contexts and interests. It sounds like searchers might be offered a chance to select a context cluster before showing query suggestions:

In some implementations, a set of queries (e.g., movie times, movie trailers) related to a particular topic (e.g., movies) may be grouped into context clusters. Given a context of a user device for a user, one or more context clusters may be presented to the user when the user is initiating a search operation, but prior to the user inputting one or more characters of the search query. For example, based on a userâ€™s context (e.g., location, date and time, indicated user preferences and interests), when a user event occurs indicating the user is initiating a process of providing a search query (e.g., opening a web page associated with a search engine), one or more context clusters (e.g., â€œmoviesâ€) may be presented to the user for selection input prior to the user entering any query input. The user may select one of the context clusters that are presented and then a list of queries grouped into the context cluster may be presented as options for a query input selection.

I often look up the inventors of patents to get a sense of what else they may have written, and worked upon. I looked up Jakob D. Uszkoreit in LinkedIn, and his profile doesnâ€™t surprise me. He tells us there of his experience at Google:

Previously I started and led a research team in Google Machine Intelligence, working on large-scale deep learning for natural language understanding, with applications in the Google Assistant and other products.

This passage reminded me of the search results being shown to me by the Google Assistant, which are based upon interests that I have shared with Google over time, and that Google allows me to update from time to time. If the inventor of this patent worked on Google Assistant, that doesnâ€™t surprise me. I havenâ€™t been offered context clusters yet (and wouldnâ€™t know what those might look like if Google did offer them. I suspect if Google does start offering them, I will realize that I have found them at the time they are offered to me.)

Like many patents do, this one tells us what is â€œinnovativeâ€ about it. It looks at:

â€¦query data indicating query inputs received from user devices of a plurality of users, the query data also indicating an input context that describes, for each query input, an input context of the query input that is different from content described by the query input; grouping, by the data processing apparatus, the query inputs into context clusters based, in part, on the input context for each of the query inputs and the content described by each query input; determining, by the data processing apparatus, for each of the context clusters, a context cluster probability based on respective probabilities of entry of the query inputs that belong to the context cluster, the context cluster probability being indicative of a probability that at least one query input that belongs to the context cluster and provided for an input context of the context cluster will be selected by the user; and storing, in a data storage system accessible by the data processing apparatus, data describing the context clusters and the context cluster probabilities.

It also tells us that it will calculate probabilities that certain context clusters might be requested by a searcher. So how does Google know what to suggest as context clusters?

Each context cluster includes a group of one or more queries, the grouping being based on the input context (e.g., location, date and time, indicated user preferences and interests) for each of the query inputs, when the query input was provided, and the content described by each query input. One or more context clusters may be presented to the user for input selection based on a context cluster probability, which is based on the context of the user device and respective probabilities of entry of the query inputs that belong to the context cluster. The context cluster probability is indicative of a probability that at least one query input that belongs to the context cluster will be selected by the user. Upon selection of one of the context clusters that is presented to the user, a list of queries grouped into the context cluster may be presented as options for a query input selection. This advantageously results in individual query suggestions for query inputs that belong to the context cluster but that alone would not otherwise be provided due to their respectively low individual selection probabilities. Accordingly, usersâ€™ informational needs are more likely to be satisfied.

The Patent in this patent application is:

(US20190050450) Query Composition System
Publication Number: 20190050450
Publication Date: February 14, 2019
Applicants: Google LLC
Inventors: Jakob D. Uszkoreit
Abstract:

Methods, systems, and apparatus for generating data describing context clusters and context cluster probabilities, wherein each context cluster includes query inputs based on the input context for each of the query inputs and the content described by each query input, and each context cluster probability indicates a probability that at a query input that belongs to the context cluster will be selected by the user, receiving, from a user device, an indication of a user event that includes data indicating a context of the user device, selecting as a selected context cluster, based on the context cluster probabilities for each of the context clusters and the context of the user device, a context cluster for selection input by the user device, and providing, to the user device, data that causes the user device to display a context cluster selection input that indicates the selected context cluster for user selection.

What are Context Clusters as Query Suggestions?

The patent tells us that context clusters might be triggered when someone is starting a query on a web browser. I tried it out, starting a search for â€œmoviesâ€ and got a number of suggestions that were combinations of queries, or what seem to be context clusters:

The patent says that context clusters would appear before someone began typing, based upon topics and user information such as location. So, if I were at a shopping mall that had a movie theatre, I might see Search suggestions for movies like the ones shown here:

Context Clusters

One of those clusters involved â€œMovies about Businessâ€, which I selected, and it showed me a carousel, and buttons with subcategories to also choose from. This seems to be a context cluster:

Movies about Business

This seems to be a pretty new idea, and may be something that Google would announce as an availble option when it becomes available, if it does become available, much like they did with the Google Assistant. I usually check through the news from my Google Assistant at least once a day. If it starts offering search suggestions based upon things like my location, it could potentially be very interesting.

User Query Histories

The patent tells us that context clusters selected to be shown to a searcher might be based upon previous queries from a searcher, and provides the following example:

Further, a user query history may be provided by the user device (or stored in the log data) that includes queries and contexts previously provided by the user, and this information may also factor into the probability that a user may provide a particular query or a query within a particular context cluster. For example, if the user that initiates the user event provides a query for â€œmovie show timesâ€ many Friday afternoons between 4 PM-6 PM, then when the user initiates the user event on a Friday afternoon in the future between these times, the probability associated with the user inputting â€œmovie show timesâ€ may be boosted for that user. Consequentially, based on this example, the corresponding context cluster probability of the context cluster to which the query belongs may likewise be boosted with respect to that user.

Itâ€™s not easy to tell whether the examples I provided about movies above are related to this patent or if it is tied more closely to the search results that appear in Google Assistant results. Itâ€™s worth reading through and thinking about potential experimental searches to see if they might influence the results that you may see. It is interesting that Google may attempt to anticipate what is suggests to show to us as query suggestions, after showing us search results based upon what it believes are our interests based upon searches that we have performed or interests that we have identified for Google Assistant.

The contex cluster may be related to the location and time that someone accesses the search engine. The patent provides an example of what might be seen by the searcher like this:

In the current example, the user may be in the location of MegaPlex, which includes a department store, restaurants, and a movie theater. Additionally, the user context may indicate that the user event was initiated on a Friday evening at 6 PM. Upon the user initiating the user event, the search system and/or context cluster system may access the content cluster data 214 to determine whether one or more context clusters is to be provided to the user device as an input selection based at least in part on the context of the user. Based on the context of the user, the context cluster system and/or search system may determine, for each query in each context cluster, a probability that the user will provide that query and aggregate the probability for the context cluster to obtain a context cluster probability.

In the current example, there may be four queries grouped into the â€œMoviesâ€ cluster, four queries grouped into the â€œRestaurantsâ€ cluster, and three queries grouped into the â€œDept. Storeâ€ cluster. Based on the analysis of the content cluster data, the context cluster system may determine that the aggregate probability of the queries in each of the â€œMoviesâ€ cluster, â€œRestaurantâ€ cluster, and â€œDept. Storeâ€ cluster have a high enough likelihood (e.g., meet a threshold probability) to be input by the user, based on the user context, that the context clusters are to be presented to the user for selection input in the search engine web site.

I could see running such a search at a shopping mall, to learn more about the location I was at, and what I could find there, from dining places to movies being shown. That sounds like it could be the start of an interesting adventure.

Copyright Â© 2019 SEO by the Sea âš“. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Context Clusters in Search Query Suggestions appeared first on SEO by the Sea âš“.

from
http://www.seobythesea.com/2019/02/context-clusters-search-query-suggestions/

0 Comments

Universal Search Updated at Google

1/4/2019

0 Comments

Tristan Colangelo

Sura gave up on her debugging for the moment. â€œThe word for all this is â€˜mature programming environment.â€™ Basically, when hardware performance has been pushed to its final limit, and programmers have had several centuries to code, you reach a point where there is far more signicant code than can be rationalized. The best you can do is understand the overall layering, and know how to search for the oddball tool that may come in handyâ€”take the situation I have here.â€ She waved at the dependency chart she had been working on. â€œWe are low on working fluid for the coffins. Like a million other things, there was none for sale on dear old Canberra. Well, the obvious thing is to move the coffins near the aft hull, and cool by direct radiation. We donâ€™t have the proper equipment to support thisâ€”so lately, Iâ€™ve been doing my share of archeology. It seems that five hundred years ago, a similar thing happened after an in-system war at Torma. They hacked together a temperature maintenance package that is precisely what we need.

â€œAlmost precisely.â€

~ Vernor Vinge, A Deepness in the Sky

In a science fiction novel set far in the future, Vernor Vinge writes about how people might engage in software archaeology. I understand the desire to do that, looking at some patents that give us hints about how technology is changing, and processes behind search engines do as well.

Google has just been granted a continuation patent for universal search. This post is looking at how the patents covering universal search at Google have changed. This post is not intended as a lesson on how patents work, but knowing something about how continuation patents work, can provide some insights into the processes that people at Google are trying to protect when they have updated the universal search patent. This post is also not intended as an analysis of patents, but rather a look at how search works, and has changed in the last dozen years or so

A patent is pursued by a company to protect the process described within the patent. It isnâ€™t unusual that the process protected by a patent might change in some way as it is implemented, and put into use. What sometimes happens when that takes place is that the company that was originally assigned the initial patent might file another patent. One referred to as a continuation patent, which takes the original granted date of the first version of the patent as the start time for protection under the patent.

The continuation patents are usually very similar to the earlier versions of the patents, with the description sections often being very close to identical. The parts of the patents that change are the claims sections, which are what prosecuting attorneys deciding whether to grant a patent look at and review to see if the patents are new, non-obvious and useful, and should be granted.

So, in looking at updated patents covering a specific process, ideally it makes sense to look at how the claims have changed over time.

The Original Universal Search Patent Application

Before the patent was granted, I wrote about it in the post How Google Universal Search and Blended Results May Work which was about the Universal Search Patent application published in 2008. That patent was granted, and the claims from the original filing of the patent were updated from the original application, when it was granted in 2011 (Sometimes processes in original applications have to be amended for the patent to be granted, and the claims may change to match those).

The First Universal Search Patent

In the 2011 granted version of Interleaving Search Results, the first six claims to the patent give us a flavor for what the patent covers:

The invention claimed is:

1. A computer-implemented method, comprising: receiving a plurality of first search results in a first presentation format, the first search results received from a first search engine, the first search results identified for a search query directed to the first search engine, the first search results having an associated order indicative of respective first quality scores that are used to rank the first search results; receiving one or more second search results in a second presentation format different from the first presentation format, the second search results received from a second search engine, the second search results identified for the search query directed to the second search engine, wherein the first search engine searches a first corpus of first resources, wherein the second search engine searches a second corpus of second resources, and wherein the first search engine and the second search engines are distinct from each other; obtaining a respective first quality score for a plurality of the first search results, the respective first quality score determined in relation to the corpus of first resources and obtaining a respective second quality score for each of the one or more second search results, each respective second quality score determined in relation to the corpus of second resources; and inserting one or more of the second search results into the order including decreasing one or more of the respective first quality scores by reducing a contribution of a scoring feature unique to the first search results and distinct from scoring features of the second search results so that the inserted second search results occur within a number of top-ranked search results in the order.

2. The method of claim 1, wherein the plurality of first search results comprises an ordered list of search results, and wherein the plurality of first search results is a number of highest-quality search results provided by the first search engine that are identified as responsive to the search query.

3. The method of claim 1, further comprising: receiving a third search result, the third search result received from a third search engine, wherein the third search engine searches a corpus of third resources, and wherein the third search engine is distinct from the first search engine and the second search engine; and inserting the third search result into the order.

4. The method of claim 1, wherein: the first resources are generic web pages and the second resources are video resources.

5. The method of claim 1, wherein: the first resources are generic web pages and the second resources are news resources.

6. The method of claim 4, further comprising: receiving a third search result from the second search engine; and inserting the third search result at a position between two otherwise adjacent first search results in the order, the position not being adjacent to the inserted one or more second search results.

The Second Universal Search Patent

We know that Google introduced Universal Search Results at a Searchology presentation in 2007 (a few months before the patent was filed originally), and the patent has been updated since then, with a continuation patent titled Interleaving Search Results granted in 2015, which has new claims, which insert the concept of historic click data into those. Here are the first five claims from that version of the patent:

The invention claimed is:

1. A computer-implemented method comprising: receiving in a search engine system a query, the query comprising query text submitted by a user; searching a first collection of resources to obtain one or more first search results, wherein each of the one or more first search results has a respective first search result score; searching a second collection of web resources to obtain one or more second search results, wherein each of the one or more second search results has a respective second search result score, wherein the resources of the first collection of resources are different from the resources of the second collection of web resources; determining from historical user click data that resources from the first collection of resources are more likely to be selected by users than resources from other collections of data when presented by the search engine in a response to the query text; generating enhanced first search result scores for the first search results as a consequence of the determining, the enhanced first search result scores being greater than the respective first search result scores for the first search results; generating a presentation order of first search results and second search results in order of the enhanced first search result scores and the second search result scores; generating a presentation of highest-ranked first search results and second search results in the presentation order; and providing the presentation in a response to the query.

2. The method of claim 1, wherein the historical click data represents resource collections of search results selected by users after submitting the query.

3. The method of claim 1, wherein determining from historical user click data that resources from the first collection of resources are more likely to be selected by users than resources from other collections of data when presented by the search engine in a response to the query text comprises: obtaining one or more user characteristics of the user; and determining that users having the one or more user characteristics are more likely to select resources from the first collection of resources than resources from other collections of data.

4. The method of claim 1, wherein generating the presentation of highest-ranked first search results and second search results in the presentation order comprises generating the presentation so that at least one first search result occurs within a number of highest-ranked second search results.

5. The method of claim 1, wherein generating the presentation of highest-ranked first search results and second search results in the presentation order comprises: generating each of the second search results in a web search results presentation format; and; generating each of the first search results in a different presentation format

The Updated Universal Search Patent

The newest version of Interleaving Search Results is still a pending patent application at this point, published on January 2, 2019

Publication Number: 3422216
Publication Date: 02.01.2019
Applicants: GOOGLE LLC
Inventors: Bailey David R, Effrat Jonathan J, Singhal Amit
(EN) Interleaving Search Results

Abstract:

(EN) A method comprising receiving a plurality of first search results that satisfy a search query directed to a first search engine, each of the plurality of first search results having a respective first score, receiving a second search result from a second search engine, the second search result having a second score, wherein the search query is not directed to the second search engine, wherein at least one of the first and second scores is based on characteristics of queries or results of queries learned from user click data; and determining from the second score whether to present the second search result, and if so, presenting the first search results in an order according to their respective scores, and presenting the second search result at a position relative to the order, the position being determined using the first scores and the second score

1. A method comprising:

receiving a plurality of first search results that satisfy a search query directed to a first search engine, each of the plurality of first search results having a respective first score;

receiving a second search result from a second search engine, the second search result having a second score, wherein the search query is not directed to the second search engine;
wherein at least one of the first and second scores is based on characteristics of queries or results of queries learned from user click data; and

determining from the second score whether to present the second search result, and if so:

presenting the first search results in an order according to their respective scores, and

presenting the second search result at a position relative to the order, the position being determined using the first scores and the second score.

2. The method of claim 1, wherein receiving a second search result from a second search engine comprises:

receiving a plurality of second search results, each second search result having a respective second score, each second search results from a respective second search engine, wherein the search query is not directed to the respective second search engines; and

determining from the respective second scores whether to present respective ones of the second search results.

3. The method of claim 1, wherein presenting the second search result at a position relative to the order comprises inserting the second search result at a position between two otherwise adjacent first search results in the order.

4. The method of any preceding claim, wherein the first and second search result scores are based on multiple distinct scoring features, the multiple distinct scoring features including at least one unique scoring feature of the first search engine distinct from the scoring features of the second search engine.

5. The method of any preceding claim, wherein the characteristics of queries or results of queries learned from user click data comprise a relationship between one of the first corpus of first resources and the second corpus of second resources and a particular search query.

Changes to Universal Search

If you look at them, you will see David Baileyâ€™s name on those patents. He wrote a guest post at Search Engine land about Universal Search that provides a lot of insight into how it works and the title of the post refers to that: An Insiderâ€™s View Of Google Universal Search Itâ€™s worth reading though his analysis of Universal search carefully before trying to compare the claims from one version of the patent to another

The second version of the claims refer to historic click data, and the newest version changes that to â€œuser click dataâ€, but doesnâ€™t provide any insights into why that change in the claims was made. Weâ€™ve heard spokespeople from Google tell us that they donâ€™t utilize user click data to rank content, so this gets a little confusing if they are taken at their word.

Another difference in the latest claims is where it refers to multiple distinct scoring features, and how each type of search that is blended into results has some unique scoring feature that sets it apart from the results inserted on to the search results page from a search engine before it. We do know that different types of search are ranked based upon different signals, such as freshness being important for news results, and links often for Web results. So results shown in universal search may all be relevant for a query searched for, but have some element that considers some unique features that adds diversity to what we see in SERPs.

The post Universal Search Updated at Google appeared first on SEO by the Sea âš“.

from
http://www.seobythesea.com/2019/01/universal-search-updated-at-google/

0 Comments

How Googles Knowledge Graph Updates Itself by Answering Questions

10/29/2018

0 Comments

How A Knowledge Graph Updates Itself

Elijah Hail

To those of us who are used to doing Search Engine Optimization, weâ€™ve been looking at URLs filled with content, and links between that content, and how algorithms such as PageRank (based upon links pointed between pages) and information retrieval scores based upon the relevance of that content have been determining how well pages rank in search results in response to queries entered into search boxes by searchers. Web pages connected by links have been seen as information points connected by nodes. This was the first generation of SEO.

Search has been going through a transformation. Back in 2012, Google introduced something it refers to as the knowledge graph, in which they told us that they would begin focusing upon indexing things instead of strings. By â€œstrings,â€ they were referring to words that appear in queries, and in documents on the Web. By â€œthings,â€ they were referring to named entities, or real and specific people, places, and things. When people searched at Google, the search engines would show Search Engine Results Pages (SERPs) filled with URLs to pages that contained the strings of letters that we were searching for. Google still does that, and is slowly changing to showing search results that are about people, places, and things.

Google started showing us in patents how they were introducing entity recognition to search, as I described in this post:
How Google May Perform Entity Recognition

They now show us knowledge panels in search results that tell us about the people, places, and things they recognize in the queries we perform. In addition to crawling webpages and indexing the words on those pages, Google is collecting facts about the people, places, and things it finds on those pages.

A Google Patent that was just granted in the past week tells us about how Googleâ€™s knowledge graph updates itself when it collects information about entities, their properties and attributes and relationships involving them. This is part of the evolution of SEO that is taking place today â€“ learning how Search is changing from being based upon search to being based upon knowledge.

What does the patent tell us about knowledge? This is one of the sections that details what a knowledge graph is like that Google might collect information about when it indexes pages these days:

Knowledge graph portion includes information related to the entity [George Washington], represented by [George Washington] node. [George Washington] node is connected to [U.S. President] entity type node by [Is A] edge with the semantic content [Is A], such that the 3-tuple defined by nodes and the edge contains the information â€œGeorge Washington is a U.S. President.â€ Similarly, â€œThomas Jefferson Is A U.S. Presidentâ€ is represented by the tuple of [Thomas Jefferson] node 310, [Is A] edge, and [U.S. President] node. Knowledge graph portion includes entity type nodes [Person], and [U.S. President] node. The person type is defined in part by the connections from [Person] node. For example, the type [Person] is defined as having the property [Date Of Birth] by node and edge, and is defined as having the property [Gender] by node 334 and edge 336. These relationships define in part a schema associated with the entity type [Person].

Note that SEO is no longer just about how often certain words appear on pages of the Web, what words appear in links to those pages, in page titles, and headings, alt text for images, and how often certain words may be repeated or related words may be used. Google is looking at the facts that are mentioned about entities, such as entity types like a â€œperson,â€ and properties, such as â€œDate of Birth,â€ or â€œGender.â€

Note that quote also mentions the word â€œSchemaâ€ as in â€œThese relationships define in part a schema associated with the entity type [Person].â€ As part of the transformation of SEO from Strings to Things, The major Search Engines joined forces to offer us information on how to use Schema for structured data on the Web to provide a machine readable way of sharing information with search engines about the entities that we write about, their properties, and relationships.

Iâ€™m writing about this patent because I am participating in a Webinar online about Knowledge Graphs and how those are being used, and updated. The Webinar is tomorrow at:
#SEOisAEO: How Google Uses The Knowledge Graph in its AE algorithm. I havenâ€™t been referring to SEO as Answer Engine Optimization, or AEO and itâ€™s unlikely that I will start, but see it as an evolution of SEO

Iâ€™m writing about this Google Patent, because it starts out with the following line which it titles â€œBackground:â€

This disclosure generally relates to updating information in a database. Data has previously been updated by, for example, user input.

This line points to the fact that this approach no longer needs to be updated by users, but instead involves how Google knowledge graphs update themselves.

Updating Knowledge Graphs

I attended a Semantic Technology and Business conference a couple of year ago, where the head of Yahooâ€™s knowledge base presented, and he was asked a number of questions in a question and answer session after he spoke. Someone asked him what happens when information from a knowledge graph changes and it needs to be updated?

His Answer was that a knowledge graph would have to be updated manually to have new information place within it.

That wasnâ€™t a satisfactory answer because it would have been good to hear that the information from such a source could be easily updated. Iâ€™ve been waiting for Google to answer a question like this, which made seeing a line like this one from this patent a good experience:

In some implementations, a system identifies information that is missing from a collection of data. The system generates a question to provide to a question answering service based on the missing information, and uses the response from the question answering service to update the collection of data.

This would be a knowledge graph update, so that patent provides details using language that reflects that exacly:

In some implementations, a computer-implemented method is provided. The method includes identifying an entity reference in a knowledge graph, wherein the entity reference corresponds to an entity type. The method further includes identifying a missing data element associated with the entity reference. The method further includes generating a query based at least in part on the missing data element and the type of the entity reference. The method further includes providing the query to a query processing engine. The method further includes receiving information from the query processing engine in response to the query. The method further includes updating the knowledge graph based at least in part on the received information.

How does the search engine do this? The patent provides more information that fills in such details.

The approaches to achieve this would be to:

â€¦Identifying a missing data element comprises comparing properties associated with the entity reference to a schema table associated with the entity type.

â€¦Generating the query comprises generating a natural language query. This can involve selecting, from the knowledge graph, disambiguation query terms associated with the entity reference, wherein the terms comprise property values associated with the entity reference, or updating the knowledge graph by updating the data graph to include information in place of the missing data element.

â€¦Identifying an element in a knowledge graph to be updated based at least in part on a query record. Operations further include generating a query based at least in part on the identified element. Operations further include providing the query to a query processing engine. Operations further include receiving information from the query processing engine in response to the query. Operations further include updating the knowledge graph based at least in part on the received information.

A knowledge graph updates itself in these ways:

(1) The knowledge Graph may be updated with one or more previously performed searches.
(2) The knowledge Graph may be updated with a natural language query, using disambiguation query terms associated with the entity reference, wherein the terms comprise property values associated with the entity reference.
(3) The knowledge Graph may use properties associated with the entity reference to include information updating missing data elements.

The patent that describes how Googleâ€™s knowledge graph updates themselves is:

Question answering to populate knowledge base
Inventors: Rahul Gupta, Shaohua Sun, John Blitzer, Dekang Lin, Evgeniy Gabrilovich
Assignee: Google
US Patent: 10,108,700
Granted: October 23, 2018
Filed: March 15, 2013

Abstract

Methods and systems are provided for a question answering. In some implementations, a data element to be updated is identified in a knowledge graph and a query is generated based at least in part on the data element. The query is provided to a query processing engine. Information is received from the query processing engine in response to the query. The knowledge graph is updated based at least in part on the received information.

Copyright Â© 2018 SEO by the Sea âš“. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post How Googleâ€™s Knowledge Graph Updates Itself by Answering Questions appeared first on SEO by the Sea âš“.

from
http://feedproxy.google.com/~r/seobythesea/Tesr/~3/6XWRhgceypo/

0 Comments

How Google might Identify Primary Versions of Duplicate Pages

10/12/2018

0 Comments

I came across this statement on the Web earlier this week, and wondered about it, and decided to investigate more:

If there are multiple instances of the same document on the web, the highest authority URL becomes the canonical version. The rest are considered duplicates.

~ Link inversion, the least known major ranking factor.

I read that article from Dejan SEO, and thought it was worth exploring more. As I was looking around at Google patents that included the word â€œAuthorityâ€ in them, I found this patent which doesnâ€™t quite say the same thing that Dejan does, but is interesting in that it finds ways to distinguish between duplicate pages on different domains based upon priority rules, which is interesting in determining which duplicate page might be the highest authority URL for a document.

The patent is:

Identifying a primary version of a document
Inventors: Alexandre A. Verstak and Anurag Acharya
Assignee: Google Inc.
US Patent: 9,779,072
Granted: October 3, 2017
Filed: July 31, 2013

Abstract

A system and method identifies a primary version out of different versions of the same document. The system selects a priority of authority for each document version based on a priority rule and information associated with the document version and selects a primary version based on the priority of authority and information associated with the document version.

Since the claims of a patent are what patent examiners at the USPTO look at when they are prosecuting a patent, and deciding whether or not it should be granted. I thought it would be worth looking at the claims contained within the patent to see if they helped encapsulate what it covered. The first one captures some aspects of it that are worth thinking about while talking about different document versions of particular documents, and how the metadata associated with a document might be looked at to determine which is the primary version of a document:

What is claimed is:

1. A method comprising: identifying, by a computer system, a plurality of different document versions of a particular document; identifying, by the computer system, a first type of metadata that is associated with each document version of the plurality of different document versions, wherein the first type of metadata includes data that describes a source that provides each document version of the plurality of different document versions; identifying, by the computer system, a second type of metadata that is associated with each document version of the plurality of different document versions, wherein the second type of metadata describes a feature of each document version of the plurality of different document versions other than the source of the document version; for each document version of the plurality of different document versions, applying, by the computer system, a priority rule to the first type of metadata and the second type of metadata, to generate a priority value; selecting, by the computer system, a particular document version, of the plurality of different document versions, based on the priority values generated for each document version of the plurality of different document versions; and providing, by the computer system, the particular document version for presentation.

This doesnâ€™t advance the claim that the primary version of a document is considered the canonical version of that document, and all links pointed to that document are redirected to the primary version.

There is another patent that shares an inventor with this one that refers to one of the duplicate content URL being chosen as a representative page, though it doesnâ€™t use the phrase â€œcanonical.â€ From that patent:

Duplicate documents, sharing the same content, are identified by a web crawler system. Upon receiving a newly crawled document, a set of previously crawled documents, if any, sharing the same content as the newly crawled document is identified. Information identifying the newly crawled document and the selected set of documents is merged into information identifying a new set of documents. Duplicate documents are included and excluded from the new set of documents based on a query-independent metric for each such document. A single representative document for the new set of documents is identified in accordance with a set of predefined conditions.

In some embodiments, a method for selecting a representative document from a set of duplicate documents includes: selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score, where each respective document in the plurality of documents has a fingerprint that identifies the content of the respective document, the fingerprint of each respective document in the plurality of documents indicating that each respective document in the plurality of documents has substantially identical content to every other document in the plurality of documents, and a first document in the plurality of documents is associated with the query-independent score. The method further includes indexing, in accordance with the query independent score, the first document thereby producing an indexed first document; and with respect to the plurality of documents, including only the indexed first document in a document index.

This other patent is:

Representative document selection for a set of duplicate documents
Inventors: Daniel Dulitz, Alexandre A. Verstak, Sanjay Ghemawat and Jeffrey A. Dean
Assignee: Google Inc.
US Patent: 8,868,559
Granted: October 21, 2014
Filed: August 30, 2012

Abstract

Systems and methods for indexing a representative document from a set of duplicate documents are disclosed. Disclosed systems and methods comprise selecting a first document in a plurality of documents on the basis that the first document is associated with a query independent score. Each respective document in the plurality of documents has a fingerprint that indicates that the respective document has substantially identical content to every other document in the plurality of documents. Disclosed systems and methods further comprise indexing, in accordance with the query independent score, the first document thereby producing an indexed first document. With respect to the plurality of documents, only the indexed first document is included in a document index.

Regardless of whether the primary version of a set of duplicate documents is treated as the representative document as suggested in this second patent (whatever that may mean exactly), I think itâ€™s important to get a better understanding of what a primary version of a document might be.

The primary version patent provides some reasons why one of them might be considered a primary version:

(1) Including of different versions of the same document does not provide additional useful information, and it does not benefit users.
(2) Search results that include different versions of the same document may crowd out diverse contents that should be included.
(3) Where there are multiple different versions of a document present in the search results, the user may not know which version is most authoritative, complete, or best to access, and thus may waste time accessing the different versions in order to compare them.

Those are the three reasons this duplicate document patent says it is ideal to identify a primary version from different versions of a document that appears on the Web. The search engine also wants to furnish â€œthe most appropriate and reliable search result.â€

How does it work?

The patent tells us that one method of identifying a primary version is as follows.

The different versions of a document are identified from a number of different sources, such as online databases, websites, and library data systems.

For each document version, a priority of authority is selected based on:

(1) The metadata information associated with the document version, such as

The source
Exclusive right to publish
Licensing right
Citation information
Keywords
Page rank
The like

(2) As a second step, the document versions are then determined for length qualification using a length measure. The version with a high priority of authority and a qualified length is deemed the primary version of the document.

If none of the document versions has both a high priority and a qualified length, then the primary version is selected based on the totality of information associated with each document version.

The patent tells us that scholarly works tend to work under the process in this patent:

Because works of scholarly literature are subject to rigorous format requirements, documents such as journal articles, conference articles, academic papers and citation records of journal articles, conference articles, and academic papers have metadata information describing the content and source of the document. As a result, works of scholarly literature are good candidates for the identification subsystem.

Meta data that might be looked at during this process could include such things as:

Author names
Title
Publisher
Publication date
Publication location
Keywords
Page rank
Citation information
Article identifiers such as Digital Object Identifier, PubMed Identifier, SICI, ISBN, and the like
Network locution (e.g., URL)
Reference count
Citation count
Language
So forth

The patent goes into more depth about the methodology behind determining the primary version of a document:

The priority rule generates a numeric value (e.g., a score) to reflect the authoritativeness, completeness, or best to access of a document version. In one example, the priority rule determines the priority of authority assigned to a document version by the source of the document version based on a source-priority list. The source-priority list comprises a list of sources, each source having a corresponding priority of authority. The priority of a source can be based on editorial selection, including consideration of extrinsic factors such as reputation of the source, size of sourceâ€™s publication corpus, recency or frequency of updates, or any other factors. Each document version is thus associated with a priority of authority; this association can be maintained in a table, tree, or other data structures.

The patent includes a table illustrating the source-priority list.

The patent includes some alternative approaches as well. It tells us that â€œthe priority measure for determining whether a document version has a qualified priority can be based on a qualified priority value.â€

A qualified priority value is a threshold to determine whether a document version is authoritative, complete, or easy to access, depending on the priority rule. When the assigned priority of a document version is greater than or equal to the qualified priority value, the document is deemed to be authoritative, complete, or easy to access, depending on the priority rule. Alternatively, the qualified priority can be based on a relative measure, such as given the priorities of a set of document versions, only the highest priority is deemed as qualified priority.

Take aways

I was in a Google Hangout on air within the last couple of years where I and a number of other SEOs (Ammon Johns, Eric Enge, Jennifer Slegg, and I) asked some questions to John Mueller and Andrey Lipattse, and we asked some questions about duplicate content. It seems to be something that still raises questions among SEOs.

The patent goes into more detail regarding determining which duplicate document might be the primary document. We canâ€™t tell whether that primary document might be treated as if it is at the canonical URL for all of the duplicate documents as suggested in the Dejan SEO article that I started with a link to in this post, but it is interesting seeing that Google has a way of deciding which version of a document might be the primary version. I didnâ€™t go into much depth about quantified lengths being used to help identify the primary document, but the patent does spend some time going over that.

Is this a little-known ranking factor? The Google patent on identifying a primary version of duplicate documents does seem to find some importance in identifying what it believes to be the most important version among many duplicate documents. Iâ€™m not sure if there is anything here that most site owners can use to help them have their pages rank higher in search results, but itâ€™s good seeing that Google may have explored this topic in more depth.

The post How Google might Identify Primary Versions of Duplicate Pages appeared first on SEO by the Sea âš“.

from
http://feedproxy.google.com/~r/seobythesea/Tesr/~3/6vBWs5EtsmQ/

0 Comments

Reasons Why You Should Choose a Dedicated WordPress Host

9/3/2018

0 Comments

Running an enterprise grade website on WordPress is not easy. It requires a high level of reliability, security and flexibility. This is where dedicated WordPress hosting comes to the rescue. WordPress’ hosting services gives you the option to choose from dedicated servers, shared or VPS hosting plans. At present, a large number of organizations and...

from
https://dailyseoblog.com/reasons-why-you-should-choose-a-dedicated-wordpress-host/#utm_source=rss&utm_medium=rss

0 Comments

Reasons Why You Shouldnt Start Blogging on Free Platforms

8/31/2018

0 Comments

Blogging is the best way to build credibility, increase traffic to your website, boost search engine rankings and more. Consistent blogging helps in developing and maintaining relationships with current as well as potential customers. The benefits of blogging can be understood by the fact that 55% marketers accept that blog content creation is their top...

from
https://dailyseoblog.com/reasons-why-you-shouldnt-start-blogging-on-free-platforms/#utm_source=rss&utm_medium=rss

0 Comments

Quality Scores for Queries: Structured Data Synthetic Queries and Augmentation Queries

7/30/2018

0 Comments

augmentation queries flowchart

Augmentation Queries

In general, the subject matter of this specification relates to identifying or generating augmentation queries, storing the augmentation queries, and identifying stored augmentation queries for use in augmenting user searches. An augmentation query can be a query that performs well in locating desirable documents identified in the search results. The performance of the query can be determined by user interactions. For example, if many users that enter the same query often select one or more of the search results relevant to the query, that query may be designated an augmentation query.

In addition to actual queries submitted by users, augmentation queries can also include synthetic queries that are machine generated. For example, an augmentation query can be identified by mining a corpus of documents and identifying search terms for which popular documents are relevant. These popular documents can, for example, include documents that are often selected when presented as search results. Yet another way of identifying an augmentation query is mining structured data, e.g., business telephone listings, and identifying queries that include terms of the structured data, e.g., business names.

These augmentation queries can be stored in an augmentation query data store. When a user submits a search query to a search engine, the terms of the submitted query can be evaluated and matched to terms of the stored augmentation queries to select one or more similar augmentation queries. The selected augmentation queries, in turn, can be used by the search engine to augment the search operation, thereby obtaining better search results. For example, search results obtained by a similar augmentation query can be presented to the user along with the search results obtained by the user query.

This past March, Google was granted a patent that involves giving quality scores to queries (the quote above is from that patent). The patent refers to high scoring queries as augmentation queries. Interesting to see that searcher selection is one way that might be used to determine the quality of queries. So, when someone searches. Google may compare the SERPs they receive from the original query to augmentation query results based upon previous searches using the same query terms or synthetic queries. This evaluation against augmentation queries is based upon which search results have received more clicks in the past. Google may decide to add results from an augmentation query to the results for the query searched for to improve the overall search results.

How does Google find augmentation queries? One place to look for those is in query logs and click logs. As the patent tells us:

To obtain augmentation queries, the augmentation query subsystem can examine performance data indicative of user interactions to identify queries that perform well in locating desirable search results. For example, augmentation queries can be identified by mining query logs and click logs. Using the query logs, for example, the augmentation query subsystem can identify common user queries. The click logs can be used to identify which user queries perform best, as indicated by the number of clicks associated with each query. The augmentation query subsystem stores the augmentation queries mined from the query logs and/or the click logs in the augmentation query store.

This doesn’t mean that Google is using clicks to directly determine rankings But it is deciding which augmentation queries might be worth using to provide SERPs that people may be satisfied with.

There are other things that Google may look at to decide which augmentation queries to use in a set of search results. The patent points out some other factors that may be helpful:

In some implementations, a synonym score, an edit distance score, and/or a transformation cost score can be applied to each candidate augmentation query. Similarity scores can also be determined based on the similarity of search results of the candidate augmentation queries to the search query. In other implementations, the synonym scores, edit distance scores, and other types of similarity scores can be applied on a term by term basis for terms in search queries that are being compared. These scores can then be used to compute an overall similarity score between two queries. For example, the scores can be averaged; the scores can be added; or the scores can be weighted according to the word structure (nouns weighted more than adjectives, for example) and averaged. The candidate augmentation queries can then be ranked based upon relative similarity scores.

I’ve seen white papers from Google before mentioning synthetic queries, which are queries performed by the search engine instead of human searchers. It makes sense for Google to be exploring query spaces in a manner like this, to see what results are like, and using information such as structured data as a source of those synthetic queries. I’ve written about synthetic queries before at least a couple of times, and in the post Does Google Search Google? How Google May Create and Use Synthetic Queries.

Implicit Signals of Query Quality

It is an interesting patent in that it talks about things such as long clicks and short clicks, and ranking web pages on the basis of such things. The patent refers to such things as “implicit Signals of query quality.” More about that in the patent here:

In some implementations, implicit signals of query quality are used to determine if a query can be used as an augmentation query. An implicit signal is a signal based on user actions in response to the query. Example implicit signals can include click-through rates (CTR) related to different user queries, long click metrics, and/or click-through reversions, as recorded within the click logs. A click-through for a query can occur, for example, when a user of a user device, selects or “clicks” on a search result returned by a search engine. The CTR is obtained by dividing the number of users that clicked on a search result by the number of times the query was submitted. For example, if a query is input 100 times, and 80 persons click on a search result, then the CTR for that query is 80%.

A long click occurs when a user, after clicking on a search result, dwells on the landing page (i.e., the document to which the search result links) of the search result or clicks on additional links that are present on the landing page. A long click can be interpreted as a signal that the query identified information that the user deemed to be interesting, as the user either spent a certain amount of time on the landing page or found additional items of interest on the landing page.

A click-through reversion (also known as a “short click”) occurs when a user, after clicking on a search result and being provided the referenced document, quickly returns to the search results page from the referenced document. A click-through reversion can be interpreted as a signal that the query did not identify information that the user deemed to be interesting, as the user quickly returned to the search results page.

These example implicit signals can be aggregated for each query, such as by collecting statistics for multiple instances of use of the query in search operations, and can further be used to compute an overall performance score. For example, a query having a high CTR, many long clicks, and few click-through reversions would likely have a high-performance score; conversely, a query having a low CTR, few long clicks, and many click-through reversions would likely have a low-performance score.

The reasons for the process behind the patent are explained in the description section of the patent where we are told:

Often users provide queries that cause a search engine to return results that are not of interest to the users or do not fully satisfy the users’ need for information. Search engines may provide such results for a number of reasons, such as the query including terms having term weights that do not reflect the users’ interest (e.g., in the case when a word in a query that is deemed most important by the users is attributed less weight by the search engine than other words in the query); the queries being a poor expression of the information needed; or the queries including misspelled words or unconventional terminology.

A quality signal for a query term can be defined in this way:

the quality signal being indicative of the performance of the first query in identifying information of interest to users for one or more instances of a first search operation in a search engine; determining whether the quality signal indicates that the first query exceeds a performance threshold; and storing the first query in an augmentation query data store if the quality signal indicates that the first query exceeds the performance threshold.

The patent can be found at:

Query augmentation
Inventors: Anand Shukla, Mark Pearson, Krishna Bharat and Stefan Buettcher
Assignee: Google LLC
US Patent: 9,916,366
Granted: March 13, 2018
Filed: July 28, 2015

Abstract

Methods, systems, and apparatus, including computer program products, for generating or using augmentation queries. In one aspect, a first query stored in a query log is identified and a quality signal related to the performance of the first query is compared to a performance threshold. The first query is stored in an augmentation query data store if the quality signal indicates that the first query exceeds a performance threshold.

References Cited about Augmentation Queries

These were a number of references cited by the applicants of the patent, which looked interesting, so I looked them up to see if I could find them to read them and share them here.

Boyan, J. et al., A Machine Learning Architecture for Optimizing Web Search Engines,” School of Computer Science, Carnegie Mellon University, May 10, 1996, pp. 1-8. cited by applicant.
Brin, S. et al., “The Anatomy of a Large-Scale Hypertextual Web Search Engine“, Computer Science Department, 1998. cited by applicant.
Sahami, M. et al., T. D. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23-26, 2006). WWW ’06. ACM Press, New York, NY, pp. 377-386. cited by applicant.
Ricardo A. Baeza-Yates et al., The Intention Behind Web Queries. SPIRE, 2006, pp. 98-109, 2006. cited by applicant.
Smith et al. Leveraging the structure of the Semantic Web to enhance information retrieval for proteomics” vol. 23, Oct. 7, 2007, 7 pages. cited by applicant.
Robertson, S.E. Documentation Note on Term Selection for Query Expansion J. of Documentation, 46(4): Dec. 1990, pp. 359-364. cited by applicant.
Talel Abdessalem, Bogdan Cautis, and Nora Derouiche. 2010. ObjectRunner: lightweight, targeted extraction and querying of structured web data. Proc. VLDB Endow. 3, 1-2 (Sep. 2010). cited by applicant .
Jane Yung-jen Hsu and Wen-tau Yih. 1997. Template-based information mining from HTML documents. In Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative application of artificial intelligence (AAAI’97/IAAI’97). AAAI Press, pp. 256-262. cited by applicant .
Ganesh, Agarwal, Govind Kabra, and Kevin Chen-Chuan Chang. 2010. Towards rich query interpretation: walking back and forth for mining query templates. In Proceedings of the 19th international conference on World wide web (WWW ’10). ACM, New York, NY USA, 1-10. DOI=10. 1145/1772690. 1772692 http://doi.acm.org/10.1145/1772690.1772692. cited by applicant.

This is a Second Look at Augmentation Queries

This is a continuation patent, which means that it was granted before, with the same description, and it now has new claims. When that happens, it can be worth looking at the old claims and the new claims to see how they have changed. I like that the new version seems to focus more strongly upon structured data. It tells us that it might use structured data in sites that appear for queries as synthetic queries, and if those meet the performance threshold, they may be added to the search results that appear for the original queries. The claims do seem to focus a little more on structured data as synthetic queries, but it doesn’t really change the claims that much. They haven’t changed enough to publish them side by side and compare them.

What Google Has Said about Structured Data and Rankings

Google spokespeople had been telling us that Structured Data doesn’t impact rankings directly, but what they have been saying does seem to have changed somewhat recently. In the Search Engine Roundtable post, Google: Structured Data Doesn’t Give You A Ranking Boost But Can Help Rankings we are told that just having structured data on a site doesn’t automatically boost the rankings of a page, but if the structured data for a page is used as a synthetic query, and it meets the performance threshold as an augmentation query, it might be shown in rankings, thus helping in rankings (as this patent tells us.)

Note that this isn’t new, and the continuation patent’s claims don’t appear to have changed that much so that structured data is still being used as synthetic queries, and is checked to see if they work as augmented queries. This does seem to be a really good reason to make sure you are using the appropriate structured data for your pages.

Copyright © 2018 SEO by the Sea ⚓. This Feed is for personal non-commercial use only. If you are not reading this material in your news aggregator, the site you are looking at may be guilty of copyright infringement. Please contact SEO by the Sea, so we can take appropriate action immediately.
Plugin by Taragana

The post Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries appeared first on SEO by the Sea ⚓.

from
http://feedproxy.google.com/~r/seobythesea/Tesr/~3/OSrYzTDXupk/

0 Comments

Learning to Rank

7/16/2018

0 Comments

My last Post was Five Years of Google Ranking Signals, and I start that post by saying that there are other posts about ranking signals that have some issues. But, I don’t want to turn people away from looking at one recent post that did contain a lot of useful information.

Cyrus Shepard recently published a post about Google Sucess Factors on Zyppy.com which I would recommend that you also check out.

Cyrus did a video with Ross Hudgins on Seige Media where he talked about those Ranking signals with Cyrus, called Google Ranking Factors with Cyrus Shepard. I’m keeping this post short on purpose, to make the discussion about ranking the focus of this post, and the star. There is some really good information in the Video and in the post from Cyrus. Cyrus takes a different approach on writing about ranking signals from what I wrote, but it’s worth the time visiting and listening and watching.

And have fun learning to rank.

The post Learning to Rank appeared first on SEO by the Sea ⚓.

from
http://feedproxy.google.com/~r/seobythesea/Tesr/~3/HV2IJZkJH8s/

0 Comments

Five Years of Google Ranking Signals

6/22/2018

0 Comments

LIghthouse

Braden Collum

Organic Search

1. Domain Age and Rate of Linking
2. Use of Keywords
3. Related Phrases
4. Keywords in Main Headings, Lists, and Titles
5. Page Speed
6. Watch Times for a Page
7. Context Terms on a Page
8. Language Models Using Ngrams
9. Gibberish Content
10. Authoritative Results
11. How Well Databases Answers Match Queries
12. Suspicious Activity to Increase Rankings
13. Popularity Scores for Events
14. The Amount of Weight from a Link is Based upon the Probability that someone might click upon it
15. Biometric Parameters while Viewing Results
16. Click-Throughs
17. Site Quality Scores
18. Disambiguating People
19. Effectiveness and Affinity
20. Quotes
21. Category Duration Visits
22. Repeat Clicks and Visit Durations
23. Environmental Information
24. Traffic Producing Links
25. Freshness
26. Media Consumption History
27. Geographic Coordinates
28. Low Quality
29. Television Viewing
30. Quality Rankings

Semantic Search

31. Searches using Structured Data
32. Related Entities
33. Nearby Locations
34. Attributes of Entities
35. Natural Language Search Results

Local Search

36. Travel Time for Local Results
37. Reverse Engineering of Spam Detection in Local Results
38. Surprisingness in Business Names in Local Search
39. Local Expert Reviews
40. Similar Local Entities
41. Distance from Mobile Location History
42. What People Search for at Locations Searched
43. Semantic Geotokens

Voice Search

44. Stressed Words

News Search

45. Originality

Google Ranking Signals

There are some other pages about Google Ranking Signals that donâ€™t consider up-to-date information or sometimes use questionable critical thinking to argue that some of the signals that they include are actually something that Google considers. Iâ€™ve been blogging about patents from Google, Yahoo, Microsoft, and Apple since 2005, and have been exploring what those might say are ranking signals for over a decade.

Representatives from Google have stated that â€œJust because we have a patent on something, doesnâ€™t mean we are using it.â€ The first time I heard them say that was after Go Daddy started advertising domain registrations of up to 10 years, because one Google patent (Information Retrieval Based on Historical Data) said that they might look at length of domain registration as a ranking signal, based on the thought that a â€œspammer would likely only register a domain for a period of one year.â€ (but actually, many people register domains for one year, and have their registrations on auto-renewal, so a one year registration is not evidence that a person registering a domain for just one year is a spammer.).

Iâ€™ve included some ranking signals that are a little older, but most of the things Iâ€™ve listed are from the past five years, often with blog posts Iâ€™ve written about them, and patents that go with them. This list is a compilation of blog posts that I have been working on for years, taking many hours of regular searching through patent filings, and reading blog posts from within the Search and SEO industries, and reading through many patents that I didnâ€™t write about, and many that I have. If you have questions about any of the signals Iâ€™ve listed, please ask about them in the comments.

Some of the patents I have blogged about have not been implemented by Google yet, but could be. A company such as Google files a patent to protect the intellectual property behind their ideas, the work that their search engineers and testing teams put into those ideas. It is worth looking at, reading, and understanding many of these patents because they provide some insights into ideas that Google may have explored when developing ranking signals, and they may give you ideas of things that you may want to explore, and questions to keep in mind when you are working upon optimizing a site. Patents are made public to inspire people to innovate and invent and understand new ideas and inventions.

Organic Search

1. Domain Age and Rate of Linking

Google does have a patent called Document scoring based on document inception date, in which they tell us that they will often use the date that they first crawl a site, or the first time they see a document referenced in another site, as the age of that site. The patent also tells us that Google may look at the links pointed to a site, and calculate what the average rate of links pointed to a site may be and use that information to rank a site, based upon that linking.

2. Use of Keywords

Matt Cutts wrote a newsletter for librarians in which he explained how Google crawled the web, making an inverted index of the Web with terms found on Documents from the Web that it would match up with query terms when people performed searches. It shows us the importance of Keywords in queries and how Google finds words that contain those keywords as an important part of performing searches. A copy of that newsletter can be found here: https://www.analistaseo.es/wp-content/uploads/2014/09/How-Google-Index-Rank.pdf

3. Related Phrases

Google Recently updated its first phrase-based indexing patent, which tells us in its claims that pages with more related phrases on them rank higher than pages with less related phrases on them. That patent is: Phrase-based searching in an information retrieval system. Related phrases are phrases that are complete phrases that may predict the topic a page it appears upon is about. Google might look at the queries that a page is optimized for, and look at the highest ranking pages for those query terms, and see which meaningful complete phrases frequently occur (or co-occur) on those high ranking pages.

I wrote about the updating of this patent in the post Google Phrase-Based Indexing Updated. Google tells us about how they are indexing related phrases in an inverted index (like the term-based inverted index from #2) in the patent Index server architecture using tiered and sharded phrase posting lists

4. Keywords in Main Headings, Lists, and Titles

Semantic closeness illustrated

I wrote the post Google Defines Semantic Closeness as a Ranking Signal after reading the patent, Document ranking based on semantic distance between terms in a document. The Abstract of this patent tells us that:

Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.

If a list in page has a heading on it, the items in that list are all considered to be equal distance away from the list. The words contained under a main heading on a page are all considered to be equal distance away from that main heading. All of the words on a page are considered to be equal distance away from the title to that page. So, a page that is titled â€œFordâ€ which has the word â€œmotorsâ€ on that page is considered to be relevant for the phrase â€œFord Motors.â€ Here is an example of how that semantic closeness works with a heading and a list:

5. Page Speed

Google has announced repeatedly that they consider Page Speed to be a ranking signal, including in the Google Blog post: Using site speed in web search ranking, and also in a patent that I wrote about in the post, Googleâ€™s Patent on Site Speed as a Ranking Signal.

The patent assigned to Google about Page Speed is Using resource load times in ranking search results. The patent tells us that this load time signal may be based upon measures of how long it takes a page to load on a range of devices:

The load time of an online resource can be based on a statistical measure of a sample of load times for a number of different types of devices that the page or resource might be viewed upon.

6. Watch Times for a page

While it may appear to be based upon videos, there is a Google Patent that tells us that it may rank pages higher if they are watched for longer periods of time than other pages. The post I wrote about this patent on is: Google Watch Times Algorithm For Rankings?, and the patent it is about is, Watch time based ranking.

A page may contain video or images or audio, and a watch time for those may make a difference too. Hereâ€™s a screenshot from the patent showing some examples:

Watch Time for a Page

7. Context Terms on a Page

I wrote the post Google Patents Context Vectors to Improve Search, about the patent User-context-based search engine.

The patent tells us that it may look at words that have more than one meaning in knowledge bases (such as bank, which could mean a building money is stored in, or the ground on one side of a river, or what a plane does when it turns in the air.) The search engine may take terms from that knowledge base that show what meaning was intended and collect them at â€œContext Termsâ€ and it might look for those context terms when indexing pages those words are on, so that it indexes the correct meaning

8. Language Models Using Ngrams

Google may give pages quality scores based upon language models created from those pages when it looks at the ngrams on the pages of a site. This is similar to the Google Book Ngram Viewer.

I wrote about this in the post Using Ngram Phrase Models to Generate Site Quality Scores based upon the patent Predicting site quality

The closer the quality score for a page is to a high-quality page from a training set, the higher the page may rank.

9. Gibberish Content

This may sound a little like #8 above. Google may use ngrams to tell if the words on a page are gibberish, and reduce the ranking of a page. I wrote about this in a post titled, Google Scoring Gibberish Content to Demote Pages in Rankings?, about the patent Identifying gibberish content in resources.

Here is an ngram analysis using a well-known phrase, with 5 words in it:

The quick brown fox jumps
quick brown fox jumps over
brown fox jumps over the
fox jumps over the lazy
jumps over the lazy dog

Ngrams from a complete page might be collected like that, and from a collection of good pages and bad pages, to build language models (and Google has done that with a lot of books, as we see from the Google Ngram Viewer covering a very large collection of books.) It would be possible to tell which pages are gibberish from such a set of language models. This Gibberish content patent also mentions a keyword stuffing score that it would try to identify.

10. Authoritative Results
In the post Authoritative Search Results in Google Searches?, I wrote about the patent Obtaining authoritative search results, which tells us that Google might look at the results of a search, and if none of the Pages in the SERPs that appear are authoritative enough, it might search upon one of the query refinements that are listed with those results to see if they return any authoritative results.

If they do, the authoritative results may be merged into the original results. The way it describes authoritative results:

In general, an authoritative site is a site that the search system has determined to include particularly trusted, accurate, or reliable content. The search system can distinguish authoritative sites from low-quality sites that include resources with shallow content or that frequently include spam advertisements. Whether the search system considers a site to be authoritative will typically be query-dependent. For example, the search system can consider the site for the Centers for Disease Control, â€œcdc.gov,â€ to be an authoritative site for the query â€œcdc mosquito stop bites,â€ but may not consider the same site to be authoritative for the query â€œrestaurant recommendationsâ€. A search result that identifies a resource on a site that is authoritative for the query may be referred to as an authoritative search result.

11. How Well Databases Answers Match Queries

This patent doesnâ€™t seem to have been implemented yet. But it might, and is worth thinking about. I wrote the post How Google May Rank Websites Based Upon Their Databases Answering Queries, based upon the patent Resource identification from organic and structured content. It tells us that Google might look at searches on a site, and how a site might answer them, to see if they are similar to the queries that Google receives from searchers. If they are, it might rank results from those sites higher. The patent also shows us that it might include the database results from such sites within Google Search results. If you start seeing that happening, you will know that Google decided to implement this patent. Here is the screenshot from the patent:

12. Suspicious Activity to Increase Rankings

Another time that Google publicly stated that â€œjust because we have a patent doesnâ€™t mean we use it, came shortly after I wrote about a patent in a post I called The Google Rank-Modifying Spammers Patent based upon the patent Ranking documents. It tells us about a transition rank that Google may assign to a site where they see activity that might be suspicious, such as keyword stuffing. Instead of improving the ranks of pages, they might decrease them, or rerank them randomly. The motivation behind it appears to be to have those people making changes to do more drastic things. The patent tells us:

Implementations consistent with the principles of the invention may rank documents based on a rank transition function. The ranking based on the rank transition function may be used to identify documents that are subjected to rank-modifying spamming. The rank transition may provide confusing indications of the impact on rank in response to rank-modifying spamming activities. Implementations consistent with the principles of the invention may also observe spammersâ€™ reactions to rank changes to identify documents that are actively being manipulated.

13. Popularity Scores for Events
Might Google rank pages about events higher based upon how popular it might perceive that event to be? I wrote the post Ranking Events in Google Search Results about the patent Ranking events which told us about popularity of an event being something that would make a difference. The following Screenshot from the patent shows some of the signals that go into determining a popularity score for an event:

signal Scores for an event

Some patents provide a list of the â€œAdvantagesâ€ of following a process in the patent, as does this one:

The following advantages are described by the patent in following the approach it describes.

1) Events in a given location can be ranked so that popular or interesting events can be easily identified.
2) The ranking can be adjusted to ensure that highly-ranked events are diverse and different from one another.
3) Events matching a variety of event criteria can be ranked so that popular or interesting events can be easily identified.
4) The ranking can be provided to other systems or services that can use the ranking to enhance the user experience. For example, a search engine can use the ranking to identify the most popular events that are relevant to a received search query and present the most popular events to the user in response to the received query.
5) A recommendation engine can use the ranking to provide information identifying popular or interesting events to users that match the usersâ€™ interests.

14.The Amount of Weight from a Link is Based upon the Probability that someone might click upon it

I came across an update to the reasonable surfer patent, which focused more upon anchor text used in links than the earlier version of the patent, and told us that the amount of weight (PageRank) that might pass through a link was based upon the likelihood that someone might click upon that link.

The post is Googleâ€™s Reasonable Surfer Patent Updated based upon this patent Ranking documents based on user behavior and/or feature data. Since this is a continuation patent, it is worth looking at the claims in the patent to see what they say it is about. They do mention how ranking is affected, including the impact of anchor text and words before and after a link.

identifying: context relating to one or more words before or after the links, words in anchor text associated with the links, and a quantity of the words in the anchor text, the weight being determined based on whether the particular feature data corresponds to the stored feature data associated with the one or more links or whether the particular feature data corresponds to the stored feature data associated with the one or more other links, the rank being generated based on the weight; identifying, by the one or more devices, documents associated with a search query, the documents, associated with the search query, including the particular document; and providing, by the one or more devices, information associated with the particular document based on: the search query, and the generated rank.

15. Biometric Parameters while Viewing Results

This patent was one that I wondered about whether or not Google would implement, and suspect that many people would be upset if they did. I wrote about it in Satisfaction a Future Ranking Signal in Google Search Results?, based upon Ranking Query Results Using Biometric Parameters. Google may watch through a smart phoneâ€™s reverse camera to see the reaction of someone looking at results in response to a query, and if they appear to be unsatisfied with the results, those results may be demoted in future search results.

how satisfaction might be used with Search Results Pages

16. Click-Throughs

Weâ€™ve been told by Google Spokespeople that click-throughs are too noisy to use as a ranking signal, and yet a patent came out which describes how they might be used in such a way. With some thresholds, like clicks not counting until after the first 100, or a certain amount of time passes. The post I wrote about it in was Google Patents Click-Through Feedback on Search Results to Improve Rankings, based upon Modifying search result ranking based on a temporal element of user feedback

Rand Fishkin sent me a message saying that his experience has been that clicks were counting as ranking signals, but he was also seeing thresholds of around 500 clicks before clicks would make a difference. Itâ€™s difficult to tell with some signals, especially when Google makes statements about them not being signals in use.

Rand's tweet in response to my post, about his experiment. — Randâ€™s tweet in response to my post, about his experiment.

And Rand responded about what I said in the post about thresholds as well:

Threshold on click rates tweet.

17. Site Quality Scores

If you search for â€œseobythesea named entitiesâ€ it is a signal that you have an expectation that you can find information about named entities on the site seobythesea.com.

If you do a site operator search such as â€œsite:http://www.seobythesea.com named entitiesâ€ you again are showing that you expect to be able to find information about a particular topic on this site. These are considered queries that refer to a particular site.

They are counted against queries that are considered to be associated with a particular site. So, if there are more referring queries than associated queries, the quality score for a site is higher.

If there are less referring queries than associated queries, then the quality score is lower. The post I wrote about this was How Google May Calculate Site Quality Scores (from Navneet Panda) based upon the patent Site quality score. A lower site quality score can mean a lower rank, as the patent tells us:

The site quality score for a site can be used as a signal to rank resources, or to rank search results that identify resources, that are found in one site relative to resources found in another site.

18. Disambiguating People

Like the patent about covering terms with more than one meaning by including context terms on their pages, when you write about people who may share a name with someone else, if they are also on sites such as Wikipedia, and disambiguated entries, make sure you include context terms on your page that makes it easier to tell which person you are write about.

The post I covered this in was Google Shows Us Context is King When Indexing People, based upon the patent Name disambiguation using context terms

19. Effectiveness and Affinity

If you search for something on a phone such as a song, and you have a music app on that phone that has that song upon it, Google may tell you what the song you are searching for is, and that you can access it on the app that you have loaded on your phone.

Social network affinities seem to be related to this. If you ask a question that might involve someone whom you might be connected to on a social network, they might be pointed out to you. See Effectiveness and Affinity as Search Ranking Signals (Better Search Experiences) about Ranking search results.

20. Quotes

Google seems to know who said what and has a patent on it.

See Google Searching Quotes of Entities on the patent Systems and methods for searching quotes of entities using a database.

21. Category Duration Visits

Could visits to specific Categories of a site have a positive effect on the rankings of those visited sites? We know that people from Google have said that use behavior signals like this tend to be noisy; but what are you to think when the patent I was writing about describes ways to reduce noise from such signals?

The post is A Panda Patent on Website and Category Visit Durations, and it is about a patent co-authored by Navneet Panda titled Website duration performance based on category durations.

22.Repeat Clicks and Visit Durations

I want to believe when Google Spokespeople say that Google doesnâ€™t use click data to rank pages, but I keep on seeing patents from Navneet Panda that Googleâ€™s Panda Update was named after which describes user behavior that may have an impact.

The post is Click a Panda: High Quality Search Results based on Repeat Clicks and Visit Duration, and the patent it is about is one called Ranking search results

23 Environmental Information

Google can listen to a television playing, and respond to a question such as â€œWho is starring in this movie I am watching?

I wrote about it in Google to Use Environmental Information in Queries, and the post is based upon the patent
Answering questions using environmental context

24. Traffic Producing Links

Google might attempt to estimate how much traffic links to a site might bring to that site. If it believes that the links arenâ€™t bringing much traffic, it may discount the value of those links.

I wrote about this in the post Did the Groundhog Update Just Take Place at Google?
It is about the patent Determining a quality measure for a resource

25. Freshness
New Google Freshness-Based Ranking Patent
Freshness based ranking

26. Media Consumption History

Google Media Consumption History Patent Filed
Query Response Using Media Consumption History

27. Geographic Coordinates

A patent called Determining geographic locations for place names in a fact repository was updated in a continuation patent, which I wrote about in Google Changes How they Understand Place Names in a Knowledge Graph. The claims from the patent were updated to include many mentions of â€œGeographic Coordinatesâ€ which indicated that including Latitude and Longitude information in Schema for a site might not be a bad idea. Itâ€™s impossible to say, based upon the patent that they use those signals in ordinary websites that arenâ€™t knowledge base sites like a Wikipedia or an IMDB or Yahoo Finance. But it seemed very reasonable to believe that if they were hoping to see information in that form in those places that it wouldnâ€™t hurt on Websites that were concerned about their locations as well (especially since knowledge bases seem to be the source of facts for many sites in places such as knowledge panels.)

28. Low Quality

How Google May Classify Sites as Low Quality Sites
Classifying sites as low quality sites

29. Television Watching

Google Granted Patent on Using What You Watch on TV as a Ranking Signal
System and method for enhancing user search results by determining a television program currently being displayed in proximity to an electronic device

http://www.seobythesea.com/wp-content/tv-as-ranking-signal.jpg

30. Quality Rankings
How Google May Rank Web Sites Based on Quality Ratings
Website quality signal generation

Semantic Search

31. Searches using Structured Data

Google recently published a patent which showed how Structured data in the form of JSON-LD might be used on a page, and might cause Google to search for values of attributes of entities described in that structured data, such as what book was published by a certain author during a specific time period. The patent explained how Google could search through the structured data to find answers to a query like that. My post is Google Patent on Structured Data Focuses upon JSON-LD, and the patent it covers is .

32. Related Entities

A search for an entity with a property or attribute that may not be the most noteworthy, but may be known may be findable in search results. In a post about this, I used and example query about â€œWhere was George Washington a Surveyor?â€ since he is most well know for having been President. The post is Related Entity Scores in Knowledge Based Searches, based on the patent Providing search results based on sorted properties.

33.Nearby Locations

How Google May Interpret Queries Based on Locations and Entities (Tested)
Interpreting User Queries Based on Nearby Locations

34 Attributes of Entities
How Knowledge Base Entities can be Used in Searches
Identifying entities using search results

35. Natural Language Search Results
Direct Answers â€“ Natural Language Search Results for Intent Queries
Natural Language Search Results for Intent Queries

Local Search

36. Travel Time for Local Results

How far someone may be will to travel to a place may be a reason why Google might increase the ranking of a business in local search results. I wrote about this in the post Ranking Local Businesses Based Upon Quality Measures including Travel Time based upon the patent Determining the quality of locations based on travel time investment.

Would you drive an hour away for a slice of pizza? If so, it must be pretty good pizza. The abstract from the patent tells us this:

â€¦the quality measure of a given location may be determined based on the time investment a user is willing to make to visit the given location. For example, the time investment for a given location may be based on comparison of one or more actual distance values to reach the given location to one or more anticipated distance values to reach the given location.

37. Reverse Engineering of Spam Detection in Local Results

In the post How Google May Respond to Reverse Engineering of Spam Detection, I wrote about the patent Reverse engineering circumvention of spam detection algorithms. I remembered how Google responded when people brought up the Google Rank-Modifying Spammers Patent, that I wrote about in #13, telling people that just because they had a patent doesnâ€™t mean they necessarily use it. This patent is slightly different from the Rank modifying spammerâ€™s patent, in that it only applies to local search, and it may keep a spamming sight from appearing at all, or appearing if continued activity keeps on setting off flags. As the patent abstract tells us:

A spam score is assigned to a business listing when the listing is received at a search entity. A noise function is added to the spam score such that the spam score is varied. In the event that the spam score is greater than a first threshold, the listing is identified as fraudulent and the listing is not included in (or is removed from) the group of searchable business listings. In the event that the spam score is greater than a second threshold that is less than the first threshold, the listing may be flagged for inspection. The addition of the noise to the spam scores prevents potential spammers from reverse engineering the spam detecting algorithm such that more listings that are submitted to the search entity may be identified as fraudulent and not included in the group of searchable listings.

38. Surprisingness in Business Names in Local Search

Another patent that is about spam in local search is one I wrote about in the post Google Fights Keyword Stuffed Business Names Using a Surprisingnesss Value written about the patent Systems and methods of detecting keyword-stuffed business titles.

This patent targets keyword stuffed business names that include prominent business names to try to confuse the search engine. Examples include such names as â€œLocksmith restaurant,â€ and â€œCourtyard 422 Y st Marriott.â€

39. Local Expert Reviews

Iâ€™ve been hearing people suggest that reviews can help a local search rank higher, and I have seen reviews considered equivalent to a mention in the Google patent on Location Prominence. But, Iâ€™ve now also seen a Google patent which tells us that a review from a local expert might also increase the rankings of a local entity in local results. My post was At Google Local Expert Reviews May Boost Local Search Results on the patent Identifying local experts for local search

40. Similar Local Entities

When you search for a local coffeehouse, Google may decide that it wants to show you similar local businesses, and may include some other coffee houses or other similar results in what you see also. I wrote a post on this called How Google May Determine Similar Local Entities, from the patent Detection of related local entities.

41. Distance from Mobile Location History

Google to Use Distance from Mobile Location History for Ranking in Local Search
Ranking Nearby Destinations Based on Visit Likelihood and Predicting Future Visits to Places From Location History

4239. What People Search for at Locations Searched

Search for a place that you might visit, and the query refinements that you might see may be based upon what people at that location you are considering visiting may have searched for. This doesnâ€™t affect the rankings of the results you see, but instead the query refinements that you are shown. See Local Query Suggestions Based Upon Where People Search based on Local query suggestions.

42. Semantic Geotokens

Better Organic Search Results at Google Involving Geographic Location Queries
Semantic geotokens

Voice Search

44. Stressed Words in spoken queries

This may not be something you can optimize a page for, but it does show that Google is paying attention to voice search and where that might take us. In the post Google and Spoken Queries: Understanding Stressed Pronouns based upon the patent Resolving pronoun ambiguity in voice queries, we see that Google may be listening for our voices to emphasize certain words when we ask for something. Here is an example from the patent:

A voice query asks: â€œWho was Alexander Graham Bellâ€™s father?â€
The answer: â€œAlexander Melville Bellâ€
A followup voice query: â€œWhat is HIS birthday?â€
The answer to the followup query: â€œAlexander Melville Bellâ€™s birthday is 3/1/1819â€

News Search

45. Originality in News Search

Originality Replaces Geography as Ranking Signal in Google News
Methods and apparatus for ranking documents

The post Five Years of Google Ranking Signals appeared first on SEO by the Sea âš“.

from
http://feedproxy.google.com/~r/seobythesea/Tesr/~3/1zWt3NZI2rQ/

0 Comments

10 Websites to Check Google Keyword Ranking

6/20/2018

0 Comments

If you are reading this article, you already know the importance of checking Google keyword rankings and how keywords are performing on the most popular search engine. It is mandatory to find out which keyword is performing well on Google to create a strategy to improve your keyword rankings. Keyword tracking not only boosts your...

from
https://dailyseoblog.com/websites-to-check-google-keyword-ranking/#utm_source=rss&utm_medium=rss

0 Comments

<<Previous

Context Clusters and Query Suggestions at Google

Context Clusters in Query Suggestions

What are Context Clusters as Query Suggestions?

User Query Histories

The Original Universal Search Patent Application

The First Universal Search Patent

The Second Universal Search Patent

The Updated Universal Search Patent

Changes to Universal Search

How A Knowledge Graph Updates Itself

Updating Knowledge Graphs

How does it work?

Take aways

Augmentation Queries

Implicit Signals of Query Quality

References Cited about Augmentation Queries

This is a Second Look at Augmentation Queries

What Google Has Said about Structured Data and Rankings

Google Ranking Signals

Organic Search

Semantic Search

Local Search

Voice Search

News Search

Author

Archives

Categories