Google Analytics Reporting

How to Figure Out Origin of Direct Traffic

Original question from Quora:

How do I figure out where direct traffic in Google Analytics found my website in the first place? No one just enters a URL of a website first time visiting it; they need to know about the website through ads, organic search, referrals etc.

The short answer is: you can’t. The fact that the traffic is being classified under “Direct” by definition means that there is no referring information for Google Analytics to use to determine where the user came from.

You’re right that it seems strange and unlikely that someone’s first visit to a website would start by them typing in the URL, so here are a couple of possible explanations for that:

  • It’s not actually the user’s first visit. These days browsers are deleting cookies more aggressively and more users are adopting ad/tracking blockers. GA’s new / return visitor dimension depends on the GA cookie being present. If the GA cookie gets deleted for whatever reason, when that user comes back (even if they have been to your site dozens of times) they will look like a new visitor in the GA data.

  • Referral data or UTM parameters are getting stripped. It’s not unheard of for those details to get lost if there are a lot of redirects in between the origin site and the destination site. There are also some sites that are configured to strip any non-whitelisted URL parameters before the page (and therefore the GA tracking code) loads.

  • Vanity URLs. If you’re doing offline marketing where you’re using vanity URLs like and you don’t do anything special to flag those referrals in GA, they’ll come through as direct.

My heart always sinks when I’m digging into some anomaly and find that it’s related to direct traffic because that’s virtually a dead end for my investigation. You can try breaking down direct by landing page to get some sense of which page the user started on. If it’s a page deeper in the site, it might give you a hint to how/why they got there. But odds are it’s the homepage, and you’ll be stuck in another analytical dead end.

I wish I had a different and more helpful answer for you, but this is one of the painful and annoying realities of digital analytics!


Explanations for Data Discrepancies Between Reporting Tools

If you’ve ever been dragged kicking and screaming into a data discrepancy discussion or forced to spend your time investigating a 2% variance between your ad-side reporting and your digital analytics reporting, this post is for you.

I’m not saying that discrepancies aren’t a concern or shouldn’t be investigated; it’s important that everyone feels confident in the quality of data. But I am asking you NOT to make me spend 10 hours looking into why your Facebook clicks in Facebook Analytics don’t 100% match your Facebook referral sessions in Google Analytics. Here’s why…

Different tools process data differently

No two reporting tools will ever be a 100% match. Data collection methodologies vary, the way data is processed varies, how sessions are defined can vary, etc. My general rule of thumb is that if it’s within 10-15% and the trendlines are the same, you’re in excellent shape!

Ad-side click data will always higher than site-side page load data

Be aware that most ad-side reporting tools (Google AdWords, Facebook, Twitter, Pardot) collect data on the click, where site-side digital analytics tools (Adobe Analytics and Google Analytics) collect data on page load. If the user abandons the landing page or clicks to another page before the landing page fully loads, the digital analytics tracking code with the campaign parameters might not execute. In that case, the user will be counted in the ad-side reporting but not the digital analytics reporting.

Ad-side reporting tools will always show higher numbers compared to site-side reporting tools, and this can be exacerbated if you have slow page load time.

Not all metrics are comparable

Different metrics are processed and calculated differently so be aware of which metrics you’re trying to compare. Don’t mix and match simple counter metrics with ones that are deduplicated across a session or user:

  • Clicks and pageviews are typically simple counters and will increment each time they happen. These types of metrics are always higher than the ones that follow in this list.
  • Unique clicks, unique pageviews, sessions, and visits are deduplicated at the session level, so even if someone clicks or looks at that page multiple times in a session, it would only be counted once. These metrics are lower than clicks and pageviews for that reason.
  • Users and visitors are deduplicated across the user’s lifespan (or until their vistior ID cookie expires or is cleared). These metrics are lower than sessions.

Not all conversion rates are comparable

The definition of a “conversion rate” can vary depending on the context. Be sure you know what metrics are being used in the conversion rate calculation before trying to compare them.

Ad-side reporting often shows “clicks ÷ impressions” while site-side reporting typically shows “orders ÷ visits”.

Not all tools use the same attribution

Different tools use different attribution models which can skew data. Typically ad-side reporting tools will give 100% of all conversions to themselves because they are not taking other marketing touchpoints into account.

Digital analytics tools have to share attribution across multiple marketing channel touchpoints, so the data per channel or campaign is often lower than what the ad-side reporting tool shows.

For example, consider this user journey:

  1. two days ago, a visitor came to your site via a Facebook ad
  2. one day ago, a visitor returned to your site via a Google paid search ad
  3. today the visitor returned to your site via an email and made a $100 purchase

Here’s how the data will look in the various reporting tools:

  • Facebook reporting is going to claim $100 for itself
  • Google Ads reporting is going to claim $100 for itself
  • Email reporting is going to claim $100 for itself
  • Digital analytics reporting has to share that $100 across the 3 touchpoints, and the allocation will vary depending on your digital analytics tool’s attribution defaults or configurations. Assuming it’s configured to last non-direct click attribution, the digital analytics reporting will show:
    • Facebook ad = $0
    • Google ad = $0
    • Email = $100

Ad blockers can affect data collection

Some users block tracking in their browsers which could prevent your digital analytics tool from collecting data. (Presumably it would also prevent the ad-side data from being collected, but that completely depends on what blocker utility the user is using and what it does/doesn’t block.)

Time zones can skew daily numbers

If two reporting tools are configured for different time zones, the data will not align when you’re looking at daily numbers.

Revenue can be defined in many different ways

Digital analytics revenue data is typically pure product demand revenue collected the moment the order is placed. It is not fair to compare to shipped revenue, which accounts for situations like fraudulent orders or order cancellations. Digital analytics data also doesn’t normally account for returned purchases.

Feel free to copy / paste that next time you get the dreaded data discrepancy questions!


Adding Historical and Statistical Context to Your Trended Reports

Traffic and conversion numbers go up and down every day. When looking at trended data, it can be difficult to know when an increase or decrease is truly significant. Sometimes our stakeholders can unnecessarily panic about a dip, or overly congratulate themselves about a spike. 

This post shows how to add historical and statistical context to trended data using a simple standard deviation calculation in Excel. There are also tips for how to visualize this data to make the statistical concepts very simple for the report recipients to read and understand.

Why this is helpful… 

  • Provides context for trended data
  • Accounts for seasonality
  • Removes the guesswork from deciding whether an increase or decrease requires action
  • Especially useful in post-launch scorecards to help stakeholders decide whether or not to roll back changes

How to do it… 

For a given metric, pull historical data as far back as possible. For example, if you’re analyzing weekly homepage visits, pull historical weekly data back at least 53 weeks, but ideally 2-5 years if you have the data to support it. The more historical data you have, the better.

In Excel, use the STDEV function to find the standard deviation across all the historical values for that metric. This calculation will yield a number that gives the normal range of variance for that set of values:


Using the standard deviation in conjunction with the data points from the prior year, it is possible to create a series of contextual bands on your chart. Then when the current year’s data is overlaid on top of those bands, it makes it very clear whether or not this year’s performance is within the normal range of variance:

  • 1 standard deviation above or below = acceptable
  • 2 standard deviations above or below = outperforming/underperforming
  • 3 standard deviations above or below = requires attention

To make the banded chart, you will need a summary table that looks something like this:Summary Data Table

  • Prior Year – Data point from the previous year for the same week.
  • This Year – Data point from this year which is the central point of this report.
  • Attention Required – A formula subtracting two standard deviations below the prior year data point for that week. This will be the bottom threshold for the banded chart.
  • Underperforming / Outperforming / Great – The standard deviation value. We will use this value to make a stacked area chart.
  • Acceptable – Also used for the stacked area chart, but because the acceptable band has one standard deviation above and another standard deviation below the prior year value this value needs to be the standard deviation value multiplied by 2. 

When configuring the chart in Excel,  the prior year and current year should be line chart types. In order to create the bands, configure all the standard deviation-related data points as stacked area charts. All the data series should remain on the same primary axis.


What to do with the information…  

If a key performance indicator dips into the “Attention Required” zone, that means performance has been very negatively affected and it should be investigated and addressed immediately. In the case of a site or page overhaul or campaign launch, the team should consider rolling back.