Skip to content
Analytics 2eba248

The Evil Side of Google? Exploring Google's User Data Collection

Danny Dover

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Danny Dover

The Evil Side of Google? Exploring Google's User Data Collection

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Update: You can now download the complete list of Google User Data by clicking here.


Google Inc. is first and foremost a data company. In the past, it competed on a level playing field by manipulating publicly available data better than its competition. By doing this, it had unprecedented success.



Enter Web 2.0. Hard drives, processors, bandwidth, and even workers are now all relatively inexpensive. This has caused the barriers to entry in the search field to drastically lower. As Google’s competition has started to catch up (MSN Image Search) and new competitors are arising, (Cuill) the search engine is looking for some kind of advantage. Since everyone has reasonably equal access to the internet’s content, leaders have been striving to gain access to private data. The most cost effective way of doing this for the engines is by collecting data from the users that already use their services. Google has been increasingly serving its users by using their personal data to manipulate public data in individualized ways. These methods are impossible to copy without the necessary personal data.


The Methods Google Uses to Get Data


Click Tracking - Google logs all the navigational clicks (ads, actions, feature clicks, etc) of all of its users on all of its services.



Forms - Along with the data the user enters directly into the forms (username, password, etc), Google logs the time and date and location of submission.



Google Form
Code From Google Account Sign Up

1. Input type is hidden so user doesn't see or enter data into given field
2. Location to send user after submitting (hidden)
3. Input type is hidden so user doesn't see or enter data into given field
4. User's referrer data is used and sent via the form so Google knows where user clicked "Sign Up" (hidden)


Cookies - Google uses cookies on all of its web properties. Additionally, it leaves advertising (Doubleclick) cookies to track users' movement around the web. By doing this, Google can track individual users on any page that has either Doubleclick or Adsense ads. This means millions of pages that are not on Google’s web properties.



Google Cookies
Unique cookies stored on user's computer from multiple Google web properties


Server Requests Stored in Log Files - Every request made to any of Google's server (ex. GET http://www.google.com) is stored in log files. The content stored is dependent on the type of request. (See ‘normal search’ below for more details.)



Example of Log File
Example of a log file

URL - "http://www.google.com/search?hl=en&q=seomoz&ie=UTF-8"
1. IP Address from user making request. This can be used to geo-locate the user
2. Date, time, and time zone offset of user
3. Language of requested result (in this case, English)
4. Search query
5. Operating system of user
6. Browser of user

The additional information is less important but details the server type of request, the server response, and rendering engine.


Javascript - Google has small amounts of javascript embedded in websites all over the internet. When a user’s browser executes the script in the background, Google is able to tell a lot of important information on a person’s browsing habits (location, operating system, browser type and version, etc).



Web Beacons - Google embeds small (1 pixel by 1 pixel) transparent .gifs into many of its checkout screens. Just like the javascript, a user downloads the invisible image and sends information about their computer to Google.



Example of a Web Beacon (What you can't see it? That is the point.)



Understanding What Google Does with the Data


Store - Google uses an internal database called BigTable spread over approximately one million servers.



Google Data In 2006

Data Size (TB)
Crawl Index 800
Google Analytics 200
Google Base 2
Google Earth 70
Orkut 9
Personalized Search 4

(Source: Bigtable: A Distributed Storage System for Structured Data)


This is the size of the compressed data in terabytes (1,024 GB). That puts Google's disclosed data size at over 1 petabyte (1,048,576 GB). GREAT GOOGLEY MOOGLEY! This doesn't even consider AdSense, Gmail, Google Maps, Street View, Google Images, or other private databases. This is considered to be a lot of data now and these are stats from over two years ago before the Web 2.0 Data Rush.


Massive Data Analysis - This is a little like Charlie and the Chocolate Factory. We know that a lot of data goes into Google, and we know a lot of useful manipulated data comes out. We just don't know what happens in between.



Ompa Lumpas
Ompa Loompas working hard at Google writing pretty primary colored code.


We know that Google has many algorithms to sort and organize its data. Page Rank is the most well known. It also known that Google has many complicated spam filters, duplicate content filters, pattern detection algorithms, natural language interpreters, image recognition software, and loads of other complicated software.


Permanent Backup - The final resting place for data at Google is likely in permanent storage. Google's privacy policies hint that some user data can never be completely deleted because of permanent backups.




Understanding What Specific User Data Google Collects


Below is a list of every self-declared piece of datum that Google collects when a user interacts with its many web services. This means there is even more user data that is gathered by Google that is unknown to the public. Be forewarned, ignorance is bliss. After you read this you may feel inclined to wear a tinfoil hat.



The Comprehensive List of All the Data Google Admits to Collecting from Users

Download as:
PDF Doc Pages


Cookies and logs (described above) are used in addition to the methods used below to track users. Note: a few of the items below require a user to opt in.

Google (Normal Search)

  • Search Engine Result Pages
  • Country code domain
  • Query
  • IP address
  • Language
  • Number of results
  • Safe search
  • Additional preferences can include:
    • Street Address
    • City
    • State
    • Zip/postal code
  • Server log
    • Query
    • URL
    • IP address
    • Cookie
    • Browser
    • Date
    • Time
  • Clicks

Google Personalized Search

  • Logs every website visited as a result of a Google search.

Google Web History

Google's data on me while I researched this article

  • Content analysis of visited websites

Google Account

  • Used as resource to compile information on individual users
  • Sign up
    • Sign up date
    • Username
    • Password
    • Alternate e-mail
    • Location (country)
  • Personal picture
  • Usage
    • Friends
    • Google Services usage
    • Amount of logins

Toolbar

  • All websites visited
  • Unique application number
  • Sends all visited 404s to Google
  • Toolbar synchronization function
    • Stores autofill info with Google account
    • Sends structure of web forms to Google
  • Safe browsing
    • Stores response to security warnings
  • Stores autofill forms data
  • Spellcheck sends data to Google servers

Web History

  • Every website visited from Google SERP
  • Date
  • Time
  • Search query
  • Ads clicked
  • Which service

Translate

  • All text sent to Google servers

Google Finance

  • Stock portfolio
    • User’s stocks
    • Amount of shares
    • Date/time bought
    • Bought at price

Google Checkout

  • Buyers
    • Full legal name
    • Credit card number
    • Debit card number
    • Card expiration date
    • Card Verification Number (CVN)
    • Billing address
    • Phone number
    • E-mail address
  • Sellers
    • Bank account number
  • Personal address
  • Business category
    • Government-issued identification number
      • Social Security Number
      • Taxpayer Identification Number
    • Sales Volume
  • Transaction volume
  • Business information from Dun & Bradstreet
  • Transactions
    • Amount
    • Description of product
    • Name of seller
    • Name of buyer
    • Type of payment used
  • User trend data
    • Web Beacons
  • Referrer data

YouTube

  • YouTube SERP data
  • Registered user data
    • Videos uploaded
    • Comments posted
    • Videos flagged
    • Subscriptions
      • Channels
      • Groups
      • Favorites
    • Contacts
    • All videos watched
    • Frequency of data transfers
    • Size of data transfers
    • Click location data
    • Information display data
  • E-mail
    • Web Beacons for tracking
      • E-mail opened or discarded
  • Account basics
    • E-mail
    • Password
    • Username
    • Location (country)
    • Postal code
    • Birthdate
    • Gender

Gmail

  • Stores, processes, and maintains all messages
  • Account activity
    • Storage usage
    • Number of log-ins
  • Data displayed
  • Links clicked
  • Stores all e-mails
  • Contact lists
  • Spam trends
    • Gchat
      • All conversations and who they involve.
      • When service is used
      • Size of contact list
      • Contacts communicated with
  • Frequency of data transfers
  • Size of data transfers
  • Clicks

Calendar

  • Name
  • Default language
  • Time zone
  • Usage statistics
    • How long the service is used for
    • Frequency of data transfers
    • Size of data transfers
    • Number of events
    • Number of calendars
    • Clicks
    • Deletes every 90 days
  • All events
    • Who is going
    • Who was invited
    • Comments
    • Descriptions
    • Date
    • Time

Desktop

  • Indexes and stores
    • Versions of your files
    • Computer activity
      • E-mails
      • Chats
      • Web history
  • Mixed with web search results
  • Content analysis of data on computer for integration into SERPs (opt-in)
  • Unique application number
  • Application interacts with Google’s servers
  • Number of searches and response times

Goog 411

  • Phone number
  • Time of call
  • Duration of call
  • Options selected
  • Phone number used as identifier
  • Records all voice commands

iGoogle

  • Settings stored in Cookies
  • Settings linked to Google Account

Blogger

  • User photo
  • Birth date
  • Location
  • Frequency of data transfers
  • Size of data transfers
  • Clicks
  • Blogger Mobile
    • Phone number
    • Associates with Google Account
    • Device identifiers
    • Hardware Identifiers

Google Docs

  • E-mail address
  • Number of logins
  • Actions taken
  • Storage usage
  • Clicks
  • All collaborators
  • All text
  • All images
  • All changes (previous versions)

Groups

  • E-mail password
  • Contents of posts
  • Contents of custom pages
  • Contents of external files
  • Account activity
    • Groups joined
    • Groups managed
    • List of members
    • List of invitees
    • Ratings made
    • Preferred settings

Orkut

  • Name
  • Gender
  • Age
  • Location
  • Occupation
  • Religion
  • Friend graph
  • Hobbies
  • Interests
  • Photos
  • Invites
  • Messages
  • Orkut Mobile
    • Phone number
    • Wireless carrier
    • Content of message
    • Date
    • Time
  • Everything a user writes
  • Every blog post a user reads

Picasa

  • Friend graph
  • Favorite lists
  • Clicks (almost all Google services track all clicks)
  • All photos
  • Geotags (Exif data)
  • People who subscribe to albums

Mobile

  • Phone number
  • Device type
  • Request type
  • Carrier
  • Carrier user ID
  • Content of request
  • Maps for mobile
    • Location information (GPS)
    • Address
  • Websites visited if user asks Google to transcode
  • Voice commands

Web Accelerator

  • Web requests
  • Cache of websites before you go to them

Double Click/AdWords

  • Ads clicked
  • Age
  • Sex
  • Location
  • Trends of past visited websites
  • IP address

Health

  • Medial records
    • Doctors
    • Conditions
    • Prescriptions
    • Age
    • Sex
    • Race
    • Blood type
    • Weight
    • Height
    • Allergies
    • Procedures
    • Test results
    • Immunizations

Postini

  • E-mail address
  • Traffic patterns
  • Clicks

GrandCentral

  • Credit card
  • Credit card expiration date
  • Credit card verification number
  • Billing address
  • Stores, process and maintains
    • Voicemail messages
    • Recorded conversations
    • Contact lists
  • Storage usage
  • Number of log ins
  • Data displayed
  • Clicks
  • Telephony log information
    • Calling-party phone number
    • Forwarding numbers
    • Time of calls
    • Date of calls
    • Duration of calls
    • Types of calls

Google Merchant Search

  • Name
  • Contact information
    • E-mail address
    • Phone number

Notebook

  • Stores, processes and maintains
    • All content in notebook
    • Nickname
    • Storage usage
    • Number of log-ins

Google Web Services That Conveniently Don't Have Individual Privacy Policies Disclosing What User Data is Collected

  • Webmaster Tools
  • Google Analytics
  • AdWords
  • AdSense
  • Alerts
  • Reader
  • Earth
  • FeedBurner (technically has one, but it is useless)

Search Verticals

  • Image search
  • Map search
  • Blog search
  • Book search
  • News search
  • Patent search
  • Product search
  • Scholar search
  • Special search
  • Video search
  • Code search

By the way Google...


I found some broken links and errors on your website. On your main privacy policy page the link anchored with "Video Player" is broken. Additionally, you capitalized your own product incorrectly. "GMail" should be "Gmail." Lastly, the Google Store has text encoding issues on the homepage and the link to download sketchup is broken.



Please send my check in the mail (I am sure you already have my address).


Sources:



3D Warehouse
Advertising
Apps
Blogger
Calendar
Desktop
Docs
Firefox Extensions
Gmail
GOOG-411
GrandCentral
Groups
Health
iGoogle
Maps
Merchant Search Test
Mobile
Orkut
Personalized Search
Picasa
Postini
Store
Talk
Toolbar
Web Accelerator
YouTube
Youtube Google Privacy Channel

Additional Information:



Can you trust Google to obey the rules? - Excellent analysis of the darker side of Google Inc. as a web giant.


Danny Dover Twitter

If you have any other advice that you think is worth sharing, feel free to post it in the comments. This post is very much a work in progress. As always, feel free to e-mail me or send me a private message if you have any suggestions on how I can make my posts more useful. All of my contact information is available on my profile: Danny Thanks!

Back to Top
Danny Dover

Danny Dover is a passionate online marketer, influential writer and obsessed bucket list completer. He is the author of the bestselling book Search Engine Optimization Secrets and the founder of Intriguing Ideas LLC. Before starting his own company, Danny was the Senior SEO Manager at AT&T and the Lead SEO at SEOmoz.org.

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

Directional Reporting in GA4 — Whiteboard Friday

Directional Reporting in GA4 — Whiteboard Friday

Aug 02, 2024
UTM Tagging for Google Business Profile — Whiteboard Friday

UTM Tagging for Google Business Profile — Whiteboard Friday

Jun 21, 2024
4 Surprising SEO Test Results — Whiteboard Friday

4 Surprising SEO Test Results — Whiteboard Friday

Jun 14, 2024

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.