LibGuides: Google and Search Engines: Search Engines & Results

What are Search Engines?

Web Search engines are automated systems designed to locate material on the Web. They do not use Human editors to vet sources (as do Library Databases), and are limited to material found on the Open Web.

Are they the same as a Library?

No they're not. There are two main differences between a Library system, and a Search engine: Authority and Access.

Authority: A Search engine trawls the open Web, and so you'll find information from numerous sources (expert and amateur), whereas a Library consists of curated collections of materials that better reflect quality information;
Access: Search engines only search the open Web (Web Crawlers are excluded from indexing proprietary, Deep Web sources), this is a very broad search, but not so deep a level of material Library databases pay subscription to.

If you imagine: Library systems introduce you to good quality friends at a small party, Search engines allow you to speak to anyone off the street. This does not mean you cannot find good quality friends off the street, but you're more likely to find good conversation at a party!

What are Web Search Engines Good for?

Library databases are best to use if you're looking for specialist, peer-reviewed, scholarly information. However, these sources may not reflect current industry practices. Web Search engines are better for less expert, but more up-to-date material.

For example, a Web Search engine would be best to find news on a political party, but for discussion of their political theories, a Library Database would be better.

Hierarchy of Uses for Web Search Engines Versus Library Databases

How Search Engines Work

Crawling

Before you search, Web Crawlers gather information from across hundreds of billions of webpages and organise it in an Index. The Crawlers or "Spiders" start with a list of web addresses from past crawls, and sitemaps provided by website owners, then they use links on those sites to discover other pages. The software pays special attention to new sites, changes to existing sites and dead links.

Note: A site can opt out of crawling altogether using a file called “robots.txt”. This is one way Deep Web sites are hidden from Search engines. Library subscription Databases opt out in this way, as well as a combination of password-protected content, and intranets wholly un-linked to the public Web.

Indexing

When Crawlers find a webpage, Search engines index each page according to any number of identifiers, including keywords all the way through to website "freshness."

A Search engine Index is a bit like one in the back of a book, except there is an entry for every word seen on every webpage in the Index. When a webpage is indexed, it is cross-referenced to the entries for all of the words it contains. This means every single word is as "important" as every other word.

Note: These Search engine Indexes use “keywords” (and synonyms) NOT indexed Subject headings (like a Library Database does). These controlled Subject headings are indexed to a Database entry only when they are significant topic terms.

An Introduction to Search Engines and Web Navigation [e-version] by Levene, M.
ISBN: 9780470872703

Publication Date: 2010
Academic Search Engines [e-version] by Ortega, J. L. & Aguillo, I. F.
ISBN: 9781780634722

Publication Date: 2014

The Deep Web

The Deep Web are the parts of the Web that are not crawled and indexed by Search engines.* This material includes streaming media, company Intranets, and content held behind Paywalls. The Deep Web is not the same as the "Dark Web," websites that are not accessible via standard Web browsers.

The Surface Web is assessable via Google, Bing Search engines.

Library systems are a kind of Deep Web Search engine (as is Google Scholar)!

*The Deep Web is thought to be between 4-500x the size of the Surface Web (Association of Internet Research Specialists [AIRS], 2018)

How to Access the Dark Web. From C. Johnson, 2016, Association of Internet Research Specialists: Knowledge Network Articles. Retrieved from https://www.airsassociation.org/airs-articles/item/16220-how-to-access-the-dark-web. Copyright 2013-2018 by Association of Internet Research Specialists | AIRS. Used under Fair Dealing

Search Algorithms

When you perform a search, it is not merely a brute-force keyword search. Search engines use many different Algorithms (sets of rules) to help decide and "weigh" the value of results to show you the best ranked pages.

These ranking systems look at many factors, including the words of your query, relevance and usability of pages, expertise of sources, and your location and settings. The weight applied to each factor varies depending on the nature of your query—for example, the "freshness" of the content plays a bigger role in answering queries about current news than it does for dictionary definitions.

Note: Different Search engines use different Algorithms, for example, Google generally values "freshness" overall, whereas Bing tends toward placing more value on greater age as a sign of reliability. Google prioritises text content and is less adept with multimedia, whereas Bing is better tuned to search audio and visual material.

Meaning

Search engines do not only return matches for your input keywords, they attempt to decipher what you mean. This means a Search engine often corrects for spelling, searches synonyms, looks for semantic relationships between terms, and descriptors like terms appending a search such as image or review.

Relevance

It's not enough merely to return the greatest quantity of terms, results must also signify meaningful relevance. Algorithms help assess whether a page contains other relevant content relating to the keywords such as images, videos, and related topic information.

It is also important where the terms are found, whether the keywords are found in the main text of a page, in its HTML header, in alt-text of media, etc. Keywords found in hyperlinks are also highly relevant, as they are somewhat analogous to an Academic source's References, showing the working behind a text's conclusions, and helping the reader assess the reliability of a source.

Quality

Beyond matching the words in your query with relevant documents on the web, Search algorithms also aim to prioritize the most reliable sources available. One of the main ways Search engines achieve this is via other sources backlinking to a webpage. A "backlink" helps show other sources value the content of a webpage, and are somewhat analogous to "Citation searches" and "Journal rankings" of Academic materials.

Usability

Search engines also rank webpages depending upon load times, mobile- and desktop-friendly designs, etc. Search engines prefer webpages that are reliably usable for most users.

Webmaster Guidelines

See Bing's Webmaster Guidelines and Google's Webmaster Guidelines for updated advice on the two main Search engines and advice on how their algorithms function.

Web Search Engine Research [e-version] by Lewandowski, D. & Spink, A. (Eds.)
ISBN: 9781780526379

Publication Date: 2012

Why are my results different?

Unlike a Library Database, Search engines can return very different results depending on different settings and circumstances. This may be very valuable when you are looking for a place to eat "nearby," but not so good if you are looking for unbiased research.

Context, Settings and Personalisation all affect the results you see.

Context

We use your country and location to deliver content relevant for your area. For instance, if you’re in Chicago and you search “football”, Google will most likely show you results about American football and the Chicago Bears first. Whereas if you search “football” in London, Google will rank results about soccer and the Premier League higher.

Settings

Search settings such as Language and Safe search preferences can affect the results you receive.

Different devices will also receive different results, for example, mobile devices will receive results for mobile optimised sites by preference.

Personalisation

Search engines often tailor results based on search history, and tracking you across the Web. Google says they do not use Social media data in their rankings, however, this simply means they will not track implicit social media cues, they may still serve results based on personal details such as "Likes / +1" and explicitly stated "interests." Bing states they do return searches based on social cues like "influence," however, this may simply be a signal boost of more popular results being more commonly shared.

Note: Library Databases specifically DO NOT learn your search preferences and search styles. This is in order to minimise bias.

You can save searches in a Library Database, if you create an account, but these saved searches will not modify your results, you may simply re-run, and / or combine saved searches.

Search Engine Optimisation (SEO)

You must also be aware that Web developers can influence the ways in which Search engines serve you results (to better capture market share, or advertising revenue). There are numerous ways developers and content creators achieve this, just search for "search engine optimisation"! Search engine providers even provide developers with data, and ways to explore the way users search, see Google's Keyword planner and Bing's Keyword research tool. This allows a developer to tailor the metadata of their site to help direct searches toward their page(s). These "Targeted keywords" are like Author-supplied keywords in a Library database, and if your search can be matched to one or more of these terms, it will help manage your results, however, as they are Author-supplied, they are more open to Bias than the catalogued "Subject headings" of a Library database.

Note: Such optimisation used to be much easier to abuse, and Search engine providers do attempt to return relevant results over inaccurate spam.

The important point is to be aware Search engine results are a dynamic process, and not unbiased, static "truth."