Last Updated: 08 November 2017
As discussed in more detail in an earlier guide, Google Analytics is a service that tracks the usage of a website or web application, and provides an interface for the viewing and analysis of that data. In this guide, we will take a look at how Google Analytics collects the information it does, and how that data is then displayed to the website owner through the Google Analytics dashboard.
The Google Analytics tracking code. This piece of code, or a reference to it, must be present on each page of a website in order for data to be collected about that page.
The HTTP request - When you view a page on the internet, your browser makes what is called an HTTP request to the server that is hosting the website. It does this in order to receive the information that your browser needs to display the page you want to view. When your browser makes that request, it provides certain points of information about your system, such as the browser type, the referrer if there is one, and the language settings of the browser. In addition, most browsers will also provide access to more detailed browser and system information, such as Java and Flash support and screen resolution.
First-party cookies - The first party cookies in this case are cookies placed on your computer by Google Analytics to track information about your interactions with the website, including things such as:
From these pieces of information, almost all the metrics that are available for viewing on the Google Analytics dashboard can be calculated.
So now that we have all this data, how does it get from your browser back to Google Analytics to be compiled and presented to the website owner? Since the 2014 update, Google Analytics has three options for how it can collect the data from your browser. The three options are rather cryptically labelled 'beacon', 'xhr', and 'image'. In this guide we'll focus on 'image' as it perhaps the most common method, and the definitely the most interesting. But to understand that, first we need a little background.
At the start we briefly talked about how when you try to view a page on a website, your browser will make a request to the server that hosts that website. In return the server will send a bunch of data files back to the browser, which the browser interprets to display the page that you see. Adding an additional level of complexity, when the browser tries to create that page, the files it uses will often have references to other files that are in different places on the internet, and in order to create the page, the browser has to make additional requests to other web servers to get those files.
It is this functionality that Google Analytics uses to send the information back to Google's servers. When your browser opens a new page, Google Analytics tells your browser to make a request for a GIF (yep one of those moving image files) from Google's server. In this case though, the intention is not to receive the GIF to display it on the page, the intention is that when it makes the request for that file, it passes all the data it has collected to the server by adding it to the request URL.
About now I can hear saying "woah, woah, slow down egghead." Ok, let's go back one step, what is a URL and how does one add data to it? A URL (Uniform Resource Locator) is simply a fancy name for web address. When your browser makes the request for the GIF, it uses a URL, just like you do when you go to a website. In this case the URL for the GIF is simply http://www.google-analytics.com/collect. You can even click on that link like any other link, the only difference is that instead of a proper webpage, the only thing that is there is that single one pixel GIF that you cannot see because in addition to being tiny, it is also transparent.
Now, the next step, adding data to a URL. Sometimes, when you go to a URL, you may notice that at the end of the URL, there will be some extra stuff, like:
For example, often when you click on a link someone posts on Facebook or Twitter, if you check the URL in your address bar, you will see something like:
The bolded part of the URL shown above is called a URL parameter, and it is a way of passing data to the server hosting the website. In the case shown above, this URL parameter is likely telling the people at example.com that I got to their website by clicking a link on Facebook, information they will use to work out how many people came from that source.
You may be able to see where I am going with this now. When your browser makes the request for that little one pixel GIF, Google Analytics adds URL parameters, lots of them separated with '&'s, to the URL http://www.google-analytics.com/collect. The URL that the GIF is requested from actually ends up looking more like this:
These URL parameters are passing all the information we talked about being collected to Google Analytics. For a full explanation of what all the parameters are, check out the Google Analytics documentation, however, just from reading the URL above we can make out several pieces of information that have been sent to Google Analytics:
Many of the more personal details (browser type and device type for example) are also provided here, but are encoded for privacy reasons.
Now that we understand how Google Analytics is receiving the data, the next step is to understand how they process it into a format that website owners find useful.
Firstly, let's recap what this data will look like as Google receives it. For a given website with the Google Analytics tracking code installed, Google will receive a request using one parameter filled URL ('datapoint') for each page visited, or if the current page is refreshed. In addition to telling them information about the computer and which page sent that particular datapoint, it also provides an ID which allows it to determine which datapoints were created by the same user and which were created by other users. This is important, not because they want to know who you are, but because it allows them to group all the different datapoints up according to the user who created them. Once the datapoints for a given user in a given session (see our Basic Terminology Guide if you are unsure what a session is) can be identified, then some very useful information can be created, including:
We won't go through all the different metrics and how you could calculate them using the information above, but hopefully you get the idea!
Google Analytics is, in many ways, a complex piece of software, and understanding all the technical details of how it works requires more than a couple of pages of text. As such, the above is really just a high-level look at some of the basics, and in places does simplify to avoid getting overly technical. However, for the less technically minded, hopefully it has given you a basic understanding of how Google Analytics works.