Understanding How Google Find Your Content Part 2: Linked Data

Carlos Abiera
5 min readMar 5, 2022

Card Cataloging in library science is an artifact that we can refer to describe how the major search engine derives the process of managing and organizing massive resources in the semantic web.

Some people argue that writing is the origin of metadata. Metadata is essentially information about data. It provides meaningful insight into the data. The earliest record of writing ever discovered is the Cuneiform which beliefs hold an account of livestock. This record-keeping method is a form of metadata because it represents a thing.

Library of Alexandria

Thousands of years later, the Library of Alexandria is one of the ancient libraries that attempted to collect and store large information before Google. In this context, it is reasonable to think that there is some kind of shelf list or a subject cataloging scheme to manage all sizable collections of scrolls inside the library.

Fast forward, we have a card catalog or a library catalog — the key to finding information in a library. It can be subject or author or title cataloging. This is the holy trinity of library metadata (subject, author, title). These catalogs were printed on small 3"x5" cards and secured in drawers.

  • If you are looking for a specific book by Sigmund Freud, for example, would go to the title card catalog and look up the book by its title.
  • If you are looking for books by Victor Frankl would go to the author catalog and look up available materials by author’s last name.
  • If you are looking to find any information in the library’s collections dealing with the topic “philosophy” would go to the library’s subject catalog and look for that subject or cross‐references to that subject

However, if you are looking for materials on a particular subject, the search would begin under one subject heading, for example, “child development” and then proceed to related subject headings “early childhood” or “stages of development” or “emotional changes in childhood”. You could only search one subject at a time and this could take quite a bit of time.

From Card Catalog to the Book on the Shelf

With the advent of computers, the process of finding information sped up. Online library catalog database provides additional support to linked other libraries where users can query the system for information from various online catalog systems networks. The same information shown on the catalog card is also shown on the online record with the help of technology MARC (Machine -Readable Cataloging). A user can now have access to hundreds and thousands of bits of information in a matter of seconds.

Having all of these, we have the emergence of metadata records that are specifically made for library science with the integration of machines to understand, organize and manage metadata.

The word “metadata” was coined in 1969 by Jack E. Myers which is defined as “ a set of data that describes and gives information about other data” or “data about data”. It is important to understand metadata because it helps us find and identify the relevant resources.

In ancient times, a “Resource” is limited to scrolls and shelves, at present it can be physical objects, people, documents, abstract concepts, and data objects that are interconnected in the web. To express the relationship between different resources, RDF was developed.

Resource Description Framework (RDF) is a framework to describe web resources — also known as objects. RDF is a model that provides a logic that describes an object and its relationship to other objects.

To present an actual relationship between objects, RDF used the concept of triple and it comes with three parts:

  • The object is the thing you are making a statement about and
  • the subject is the characteristic of the object.
  • the predicate is the relationship of subject and object

and you can create a complex relationship of objects and subjects. A collection of interrelated RDF can be used to construct an RDF graph, it shows how subjects and objects are related and connected to other triples.

RDF graph

In computer science, a resource is anything that has “Name” and “Location”. Uniform Resource Identifier (URI) is used to tag a resource using NAME or LOCATION or BOTH — almost the same method to locate a resource in the card catalog system. To avoid confusion in web identifiers, URIs are split into two in the mid-90s: URL and URN. Uniform Resource Locator (URL) if you want to identify resources using location, otherwise Uniform Resource Name (URN). Technically, URN or URL are forms of URIs. Since two or more resources can have the same name, it is ineffective to use URN, that is why we embrace the use of URLs to identify resources on the web.

URL VS URN and URI

In the semantic web, meta description, title tag, headings, structured data, and image alt-values are helpful embedded elements we can use to tag or identify our web resources. The more elements are used, the more access points the search engine has to find our resources. The richer the content of each element, the better it serves the function of resource discovery, the thinner the value of each element, the less well it serves. The fastest the search engine understands our web resources the higher the chance that it can connect to a much larger network of the related resources, by then we can start competing in SERP.

Summary

The historical resource discovery method is important to understand how modern resource discovery works from pulling the book out from the library to showing search results using queries. The web and search engines are reasonably the descendants of the ancient record-keeping and access of scrolls in the Library of Alexandria and now evolved to become a tool to help people, answer queries, access products, and services using artificial intelligence.

Machines translate and locate web resources because computers don’t know anything. They only eat numbers to translate human language, but they can be trained. In the words of John Giannandrea, “we’re taking a baby step in teaching all our computers at Google something about our human world.” They built tools to import data from other sources and acquired several large and specific databases of items objects to educate artificial intelligence.

Google heavily depends on the content and quality signals across the entire web presence: whether your site, social media platforms, third-party mentions and reviews, content you crafted, etc. In short, everything you make available online. Those web connections between datasets and the relationships between entities (objects and subjects) are what make a large linked data. This is an important consideration every time we plan to create a new web resource.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Carlos Abiera
Carlos Abiera

Written by Carlos Abiera

Carlos C. Abiera currently manages the operations of Montani Int. Inc. and leads the REV365 data team. He has keen interests in data and behavioral sciences.

No responses yet

Write a response