So how many Users do these evil Ad Blockers kill? A Checklist & Guide for Comparing Apples to Apples
Part II of “Ad Blockers & Server-Side Tracking”
I struggled with finding a decent title for this second part of my series on Ad Blockers, ITP & other browser restrictions, and Server-Side Tracking. I tried “When an Order is not an Order”, “Help! Ad Blockers eat up 60% Percent of our Revenue in Analytics!”. As I studied Cultural Sciences about 70 years ago, I finally wanted to go for a more academic branding: “Comparing Apples to Apples or: A Primer in Constructivism”.
But that’s not “SEO-/Social-Media-friendly”. Best of/Pest of Lists are. Users getting wiped out (from your Analytics data), too. So you get the headline you deserve. Full of blood and exaggerrations. You still want to know what Constructivism has to do with Ad Blockers? Great. Read!
In Part 1, we talked about Ad Blockers, ITP and other Ad-Blocking/Restricting methods, and that this may lead to more and more tech-savvy companies to go for “invisible” (to Ad Blockers) methods of tracking. Because Ad-Blocker-free data is more complete data.
The first step in analyzing how much you would gain from an Ad-Blocker-free implementation is answering the question:
“How much does that Analytics data actually differ from the ‘real’ data?”
This sounds so simple that you may wonder why I even write about such a triviality! 1000 Users here, 900 there. -10 % difference. Done! Right? Well… no.
So will the ‘real data’ please stand up?
I have had about a dozen cases in the last couple of years where I had to engage in this exercise of “matching the numbers”, mostly because the clients suspected “tracking errors”, which of course is the number one reason for everything evil in this world, right after Donald Trump.
It is hard enough to find tracking (or website implementation) errors. But I had never thought it would also be so hard (in some cases even impossible) to get a sensible approximation to the “real” data.
Just to be clear: By “real data” I mean the data that Analytics supposedly does not match.
I now understand the discussions about Constructivism from my Cultural Sciences seminars better. There is no such thing as “real” data. Everyone and builds their own reality!
But let’s start with some basics for best practices on how to embark on your comparison journey:
How to Compare Analytics and “Real” Data: The Basics
a) Do NOT compare “traffic” metrics: Pageviews, Visits, Unique Visitors etc.
Your backend / IT / DataSomething team may have their own log-based “traffic numbers”. But what they call “Pageviews” or “Sessions” number will usually be far, far away from your Web Analytics equivalent, sometimes more than 50% or even 100% higher.
At a former client, the Data Science team used to work with the data extracted from the application logs. But their data looked really weird. The reason? It was so full of bots that they had to stop many of their data initiatives until they had built their own bot filter. That took half a year — just to come close to the Web Analytics traffic data. For them, the Ad-Blocker-ridden Adobe Analytics data now ironically became the “real data” benchmark.
Why are there all those bots in logs, but not (that many) in Analytics? To keep it short: Bots often do not execute JavaScript or set cookies, and some of them are even kind enough not to execute the most common JS libraries (like a Google Analytics script). And usually Web Analytics tools come with some basic bot filtering (which you have to continually enhance however to keep up sensible data quality).
But my point is: Even if you put a lot of effort in getting bots out of both the Analytics and the “backend” data, you will need a lot of time for that. And in the end you will still not feel certain enough about it. So: Instead of wasting all your time on getting the “real count of Unique Visitors”, …
b) … Compare data that can only be produced by humans
Compare metrics like Orders (GA calls them “Transactions”). No Orders on your sites? Then use other interactions that can be done only by humans, e.g. form submissions of forms which require some heavier validation. Almost no bot cares enough about your site to correctly fill in all these form fields. Very few will even add products to the basket. These “human actions” can also be used nicely in bot detection logic to exclude false positives (Visitors you think are bots because they e.g. use a Firefox version from 1927, but who are people after all). Be sure though to use an interaction that happens a lot to get a sufficiently large sample.
c) An Order is not an Order is not an Order is …
Ok, now we are talking Orders and Revenue. Should be easy to compare that with the “backend” data, shouldn’t it? Apples and apples, right?
Thou shalt not think anything was easy. After all, we are NOT comparing apples to apples…
Wait? You are saying an Order/Transaction in Adobe/Googe Analytics is not the same as an Order in Magento/Navision/Shopify/Hybris/SAP/ YourShopOrERPSystem???
You got it! Your shop system / backend is very likely going to count Orders differently! That is why, in most cases, the alleged “tracking problem” could be resolved by understanding how the client’s backend system actually counts Orders (it is even more complicated with Revenue, see further down).
So when comparing Analytics Orders to “real” (😅) Orders, try to get as close as possible to COMPARABLE Orders — do not expect to get to 100% comparable data, but 90% should be doable!
So I am exaggerating, right? Why should that be such a big challenge? Read on!
Comparing Analytics Apples to ‘real’ Apples
The following examples are all real (this time I mean real for real) examples! Comparing apples to apples was never harder.
1. An Order in the shop system is actually a delivered package (which becomes a problem when orders get split up into multiple packages) or every SKU in an Order gets its own Order ID etc…
2. The shop system immediately deletes cancelled orders (very common). These orders however show up in Analytics because Analytics usually does not know about that cancellation. And if the cancellation is not even tracked in the shop system, there is also no way to do a returns/cancellations import into Analytics.
3. Returned orders are usually not a problem (at least they were not in my cases), because they are booked separately, but watch out, they could be!
4. The shop system does not show orders until they have been processed by a sales agent/financial check.
A client measures their orders in their BI system by means of orders that have arrived in the ERP system — AFTER a financial solvency check. Thus, when a user orders on the website, that order is immediately sent to their Adobe Analytics, but many orders (e.g. by guests or new clients) do not show up in BI reports until the order has been manually approved by some human agent (e.g. it can be cancelled if the client is deemed unlikely to pay). So on Monday mornings, their Analytics shows about 10–20% MORE orders for the weekend than their BI tool (because people do not work on the weekends and only start processing the orders on Monday morning).
5. When does a shop generate its Order/Transaction ID?
I had the problem recently with Magento shops and earlier with a Spryker shop: Here the shop generated a transaction ID as soon as the user started the payment process through a third-party payment provider like SaferPay (which offers an interface for payments via PayPal, credit cards etc.). If the payment provider rejected the user (e.g. fraud likely, not solvent, false password etc.), the shop system still saved the order and the Transaction ID. So when the client did a default transaction export, those rejected transactions were still in there (in Magento, you can easily filter them out via a field called “cancelled”).
Since the transaction ID is not sent to Analytics once the order is really complete (user sees “thank you page”), you can see quite a gap there. I remember a company celebrating a couple million Swiss Francs more of revenue on a Black Friday and hearing quotes like “something must be wrong with that Analytics data”. I started preparing for suicide, but in the end, it was this other system which generated the real-time revenue dashboards, all the while not giving a damn about whether a payment had actually been accepted.
6. Orders through APIs
Some shops have a server-side Order API, e.g. for resellers or business clients that want to order frequently and in large volumes. These Orders have to be excluded from backend data first before comparing to Analytics.
7. Orders from different systems
It often happens that the shop backend does not allow to easily filter out only those orders that were placed on the website instead of those placed through people calling or faxing in (yes, still happens in 2019!) their orders. Sometimes even brick-and mortar shop orders have to be filtered out first.
8. Product bundles
This usually just affects the quantity of products sold and sometimes the revenue. Bundles are often counted as one product in the shop, but because they in fact consist of multiple products, the “backend” splits them up into multiple products with different prices.
9. Trial Orders
A client sold something with a trial period, but her backend system did not count that Order until the trial was over. At that point, Analytics has no clue about that user anymore, but it had tracked the trial order like a real one including revenue.
10. Timestamps
One client’s shop system was storing and exporting the order date in UTC time which of course never matched the Analytics data for a particular date. Even worse, because both systems exported 24 hours of data, the totals were very similar, but the individual IDs often did not match.
Choosing the right Analytics Apples for Comparison
So much for things to keep in mind from the “backend” side (the “reality”).
But there are also things to keep in mind from the Analytics side, especially if you are using Google Analytics which still has not fixed the age-old bug (no, this cannot be a feature!) which allows to track multiple transactions for one and the same transaction ID (Adobe and others usually make sure one ID cannot used multiple times):
a) Do not compare the number of Transactions
Instead, compare the count of distinct Transaction IDs within a particular date range (and in the case of GA, if you are one of the many clients who has a duplicate Transaction tracking issue, make sure you do not count a Transaction ID that was counted on date x again for date y: If you want to compare data for March, take e.g. all transaction IDs from January to March, then delete all IDs tracked in March if they have shown up in January or February already, and then look at only the distinct IDs from March). Tedious.
b) use an unfiltered View / Report Suite with no segments applied
One client compared to a GA view where co-workers were filtered out. At least easy to fix…
Comparing Revenue to Revenue: Stop it already!
So far we talked about the complexity of comparing Orders to Orders. It gets only worse with things like Revenue, so let me just briefly touch on this here before you fall asleep:
a) Tax: Does the shop/backend data use revenue before or after applying VAT?
b) Vouchers / discounts:
How does the backend calculate/display revenue if the user has paid with a voucher or got another form of a discount (and how does that calculation compare to your Analytics calculation)? Sometimes, you even have to distinguish between product-specific vouchers/discounts vs. order-specific ones (discounts on the whole order => Revenue per product may still be displayed as if no voucher had been applied, but Order Revenue is lowered by the voucher amount).
The rationale between these different calculations on the “backend” side sometimes resides in “who pays the voucher”: Order discounts may come from the Marketing budget (so it effectively is a cost for the shop owner), product-specific ones may be paid by the brand behind the product — so financially it does not mean real costs for the shop owner)! Yes, if you want to scare someone away from trying for a career in Analytics, this example should wipe out any motivation.
c) Currency conversion: In case people can order in multiple currencies, is your Analytics tool able to correctly convert the revenue into one currency? Is it the same currency that the backend is showing? A client e.g. tracked Roubles as Euros in Google Analytics and then wrote the article “Hack your life and Become a Millionaire by Age 30”.
d) Prices of refunded goods (and the tax handling for those) can be another pain to understand, so it is easier to exclude everything refunded from revenue comparisons (at least in the first step).
e) Real prices are sometimes assigned later.
This is my favorite: A client with a lot of B2B customers sometimes makes special deals with individual clients: If the client e.g. orders some laptops in a certain volume, she gets a special price — but after-the-fact! So the shop shows super-HIGH prices for these laptops (e.g. 15'000 EUR per unit) for some hours so the agent can be sure no-one else orders these laptops while his customer is shopping them (I think one reason is not to run out of stock by accident).
Once the sales agent receives his overpriced 500k EUR order, he then manually changes the price in the ERP system to e.g. 50k, so the customer can get her special deal. So 500k became 50k, and Adobe Analytics, Google Analytics, Google Ads, Facebook and just about every other tool kept having huge gaps on days when such deals happened...
So how many users do those Ad Blockers kill after all?
The things mentioned are not only important to understand when you start thinking about implementing tracking mechanisms that circumvent Ad Blockers (e.g. some forms of Server-Side Tracking). They are imperative to be aware of for any implementation. Otherwise you will keep running into “I-cannot-explain-the-difference” problems -> mistrust in your data -> risk of failure for your entire data endeavour.
So thanks for holding out until here. You still want to know how much data loss these Ad Blockers usually cause? Having done this exercise multiple times now, in Switzerland I usually land at between 10 and 20% of missing human-generated interactions (see above) where Ad Blockers are the most likely reason.
Depending on your region and you customers’ profile, this can differ (some countries are supposedly more tracking-conscious, and the “techier” your clients are or the more they surf from company networks, the bigger impact you should see).
Up next:
You have identified the “Ad Blocker gap” between “real” (lol) and Analytics data. Now you want to close it. Now that even Google Tag Manager announced “Server-Side Tag Management” for some time in the future: Is Server-Side Tracking a method you should try? And what does “Server-Side Tracking” actually mean?
And: Please add to my “Pest of” list!
Have had other funny, awkward or quite common reasons for non-matching data between Analytics and “reality”? Share them in the comments and I will add them to the list! Thank you!