Bots & Analytics, II: From Filtering out Bots to Filtering in Humans

“Hard” & “Soft” Bots, Human Signals — and Vacation in Cleveland

Recap: Why common Bot combat methods aren’t good enough for Analytics

  • Their detection comes too late => At the moment when they know it is a Bot, the data is already in your Analytics/Google Ads/Facebook Pixel/Floodlight/Hotjar/Adobe Target/Server-Side Tag Management System. That costs you double: you usually pay these tools per Server Call (= for each Hit of data you send to them), and filtering that data out after-the-fact is ginormously tedious to impossible.
  • They still have too many false positives/negatives, i.e. they miss out on too many Bots or identify humans as Bots too often => Relying on their judgment would lose you too much human traffic and you’d still have too much Bot traffic in your data

The Bot Rush and the Desperation

The Bot rush before and after the “fix”
  1. 40% Bots meant 40% more Server Calls => higher license fees for Adobe Analytics, Tealium Event Stream (Server-Side Tag Management), Hotjar, Optimizely, etc.
  2. New Bots (or say, new “disguises”) appeared every day. We would have had to update the VRS filters all the time (e.g. add 100 new ISP domains on Monday and another 100 on Tuesday etc…).
  3. Eventually, the segment behind the Bot Filters would become too complex, slowing down reporting or leading to the “segment too long” error

IT: “These Bots are not our focus”

The 2 Layers of Human / Bot Filtering: Layer 1 in the Tag Management System, i.e., in the browser, makes sure as much traffic as possible does not get tracked into Analytics & Co. in the first place, with the “Soft Bot” logic being the most important part. Analytics filtering logic (Layer 2) is still needed however to clean up those that slip through Layer 1.

Layer 1: Soft Bots, Hard Bots, and Human Signals

The Consent Manager as an inadvertent Bot Prevention System

Impact of an opt-in Consent Manager on the share of human traffic

What do Humans do that Bots don’t?

  1. Let’s find a clearly defined, large chunk of the traffic that is so contaminated by Bots (>95%) that we can risk declaring it as “botty by default” and not track it.
  2. If a visitor from that botty chunk however gives us a “human signal”, tracking would start.

Let’s analyze — First, get the “botty by default” candidate!

Traffic by Country, split by Direct/SEA/All, Bottiness Rate (% Visits by Bots), and Counter Metrics “% Visits with Login” and “Conversion Rate”.

Americans are Bots — but what about Cleveland?

Secondly, find human signals

How is this implemented in the TMS?

Don’t we lose real human traffic this way?

Dry-running the system takes away the unknown unknowns

Monitor how often the Bot Detection fails, i.e. how often presumed Bots turn out to be humans after all. A high point of 0.38% (was it Swiss tourists in Cleveland?) seems pretty solid.

And what about Bots from Switzerland?

Layer 2: Fall-Through Filtering in Analytics

Epilogue

Monitor your Engines

A little help from the vendors would be nice…

Share your approach!

Also: Check out the video of our presentation (together with David Hermann) at Superweek 2023 on the topic.

--

--

Digital Analytics Expert. Owner of dim28.ch. Creator of the Adobe Analytics Component Manager for Google Sheets: https://bit.ly/component-manager

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lukas Oldenburg

Digital Analytics Expert. Owner of dim28.ch. Creator of the Adobe Analytics Component Manager for Google Sheets: https://bit.ly/component-manager