  • We all know that Google does not impose a penalty for duplicate pages, but these are primarily determined by what pages are considered. To understand this, go to Google's ’First version of a document 03 patent on 03.10.2017.

    The method is defined as the identification of a number of different versions of a particular document by a computer system. While doing this, Google uses many types of metadata, based on the priority values ​​generated for each document release. The system selects the privilege priority for each document version based on the information associated with a priority rule and document version. Determines the primary version based on the associated authority priority and meta information. Copy documents that share the same content are defined by a web browser system. After a newly loaded document is scanned, the scanned document group (pages with the same content) is defined.

    The information that identifies the newly scanned document and selected set of documents is combined with information defining a new set of documents. Duplicate documents are included in the new document group based on a metric that is independent of the query. A document belonging to this new group is redefined according to predefined conditions. In some embodiments, to choose a representative document from a double-copy document: Based on the fact that the first document is associated with a score of independent of the search queries, each document (page) has a number of tracks defining its own content, based on the fact that each document is associated with more than one independent score. These marks indicate that the documents have essentially the same content. The first document on a large number of pages is associated with a score that is independent of the query. The method also includes indexing by independent score query.

    Why does Google find first versions of pages? Why should SEO Expert find the first versions of the pages? How does the system work?

    The second step The Priority Rule What is Qualified Priority? Google's main goal is to provide the most relevant and reliable search result. The main reasons for defining one of the different versions of a document appearing in search results as the primary release are the following. The inclusion of different versions of the same document in the results does not provide additional useful information and is not useful to users.

    Search results, which include different versions of the same document, can crowd up various other content that should be included. Where different versions of a document are available in the search results, the user may not know which version is the most authoritative, complete, or the best, and may spend additional time comparing the results. How does the system work? Different versions of a document are defined by several different sources, such as databases, websites, and other data systems. For each document version, an authorization priority is selected based on the following:

    Metadata information associated with the document version, for example Source Exclusive right to publish License right Quotation Keywords Page order Second step The length qualification of the document versions is determined using the length measure. The version with a high authorization priority and a qualified length is considered the basic form of the document. If none of the document versions have both a high priority and a qualified length, the primary version is selected based on the sum of information associated with each document version.

    Since scientific literature studies are subject to strict format requirements, journal articles, conference articles, academic articles, and citation records of journal articles have metadata information describing the content and source of documents. As a result, scientific literature studies are good candidates for the sub-identity system. The metadata that can be examined during this process may include the following: Author names Title Publisher Publication date Place of publication Keywords Page order Citation information Digital Object Identifier, PubMed Identifier, SICI, ISBN and similar identifiers Network location (e.g. URL) Number of references Number of citations Language of a page The methodology behind determining the primary version goes further: The Rule of Priority The Rule of Priority establishes a numeric value (for example, a point) to reflect the competence, integrity, or value of access to a document version. The priority rule determines the privilege priority assigned to the document version by the source of the document version, based on a source-priority list. The resource priority list contains a list of priority privileges that correspond to each resource. The priority of a resource can be based on editorial choice, such as the reputation of the source, the size of the source publisher, the repetition of updates, frequency, or other external factors. Thus each document version is associated with an authority priority; this association is stored in various data structures. What is Qualified Priority? A qualified priority value is a threshold used to determine whether a version of a document is authoritative, complete, or easy to access, depending on the priority rule. If the priority of the document version is equal to or greater than the qualified priority value, the document is considered to be authorized, complete, or easy to access, depending on the priority rule. Alternatively, the qualified priority may be based on a relative measurement of a number of document versions. Only the highest priority is considered a qualified priority. The patent for defining the primary version of duplicate pages helps us understand which Google believes is the most important release between duplicate documents. I don't know if this information will help you move your website to higher positions in search results, but it's nice to see that SEO is handled in a very serious way. Google patents are available here. Orginal Article in Turkish Language is here.

