Use Google Tag Manager Server-Side for Visitor Stitching by Querying a GCP Database in Real-Time

How to use Server-Side GTM + Pubsub + Python + Firestore to get attributes from a database about a User, Products, or anything

Recently I have found some time to finally get going (thx to Simo Ahava’s TeamSimmer) with Google’s late, but intriguing arrival to the Server-Side Tag Management Zoo (SSTMZLOL). Tealium, Signal (no, not the messenger) and probably some others have had their cages there for almost a decade now. Adobe put its own elephant into that zoo last year: Launch server-side properties which leverage the Adobe Experience Platform Web SDK. So no more social distancing there. The battle is on.

The visitor attributes are on the other side of the edge. GTM Server-Side, PubSub, Cloud Functions and Firestore help you get them.

(Too) much of Server-Side Tagging revolves around circumventing ad blockers & ITP

Another reason is that Server-Side Tag Managers were introduced to solve typical “problems” of client-side tracking. So most applications of Google Tag Manager Server-Side (GTM SS) have circled around things like:

  • making cookies circumvent Apple’s “Intelligent Tracking Prevention” (ITP)
  • getting all that user data to Facebook via the Facebook Conversions API after all no matter the browser or iOS app restrictions
  • hiding the tracking from users and ad blockers altogether. And even though Simo Ahava, while showing how to do that, keeps repeating that this is not why we should use Server-Side tagging, it is what I thought would happen.

GTM SS sits in the GCP, but is (so far) barely integrated with GCP

Anyway, in all this frenzy of “how can we track everything and more after all with the usual lip-service consent management”, I was missing use cases that leveraged the Google Cloud Platform (GCP). After all, SS GTM is built on top of GCP’s App Engine, and I have been using GCP for quite some things these last two years (automatically scheduled Python QA & ETL scripts for various data pipes, or the Adobe Analytics Bulk Component Manager for Google Sheets). I only remember one case (you know I write this only so you can spam me with more): Simo showed us how you can write to BigQuery directly from GTM SS.

The use case: Visitor Stitching

So with almost no GCP APIs at our disposal, let’s try with a custom approach! For this, we will first look at a concept of another long-time zoo member, Tealium’s Tealium’s Audience Stream (part of its server-side “Customer Data Hub”). One thing that has always intrigued me was how it does Visitor Stitching in real-time:

Tealium Audience Stream Visitor Stitching. I drew this image in 2018, but it is still up-to-date (apart from “DoubleClick Bid Manager” now being “Google Campaign Manager” and the “Universal Data Hub” having become the “Customer Data Hub”).
  1. Audience Stream then “revives” all attributes for the Visitor profile that this ID belongs to (e.g. think of attributes like gender, number of purchases, device types used, number of visits, last Net Promoter Score, favorite brand etc.).
  2. The now available “stitched” visitor profile can then be sent in real-time to vendor endpoints via “connectors” (e.g. Salesforce Marketing Cloud). There it can be used e.g. for automated email/SMS marketing journeys. Audience Stream can even send the stitched attributes back into the browser via “Data Layer Enrichment”. This allows you to bring data that was calculated and stored server-side into the browser and then send it to any client-side tag (e.g. Google Analytics) (see some examples).

So why not try a featherweight version of visitor stitching with GTM SS?

Disclaimer: This article is an exercise for myself. I am still learning GTM SS, I still feel like a loser in the GCP, and my Python is clumsy. Thus, my solution may leave a lot to be desired. The main focus should be on the mechanics, because I think similar approaches (especially with PubSub and Google Cloud Functions) can be useful for a lot of cases. Thank you for constructive suggestions!

The Ingredients

  1. Optional: To make things easier, a GTM Web Container or any other client-side TMS (Tealium, Launch etc.) to trigger the initial stitching request to GTM SS.
  2. A basic GTM SS Setup with your own custom domain endpoint (myGtmSubdomain.myDomain.com) configured.
  3. A Custom “Stitcher” Client in GTM SS to pick up the stitching request and forward it to GCP in the form of an HTTP request to a …
  4. … Google Cloud Function (the “Stitching Web Service” so to say), which is basically a Python script. Cloud Functions are the olympic racing dogs in the GCP. They can run no longer than nine minutes and have limited memory, but they start sprinting right away (no wait for scaling up etc.) and are very cheap.
  5. If the request wants a synchronous response (The GTM Server client says: “I won’t do anything until you return sth to me”), the Cloud Function directly queries …
  6. GCP’s Cloud Firestore, a schema-flexible (compared to the very strict BigQuery) document database. Here we have stored the user attributes, with every user representing a single document (think: a row) — see a screenshot further below. What I like about Firestore is that it is super-fast, and you can query and work with the data just like with normal dictionaries (the Python equivalent of JS objects), so it is built for applications. Moreover it is cheap: 50'000 reads (=more or less queries to a single document) per day are free. If you prefer some other GCP storage mechanism, maybe you can extend this idea to e.g. a Python script that reads from BigQuery.
  7. If the request desires an asynchronous response (GTM’s Server Client says “I will not wait for you and do other things in the meantime, just ping me when you are done.”), the Cloud Function triggers a message to PubSub, GCP’s messaging service. When PubSub receives the message, it triggers another cloud function which does the query to Firestore and then returns the result to the GTM Server Container via an HTTP request to GTM’s endpoint URL.

Example

The example works as follows:

A POST Request with the “id” (“R210319-…”) in the body => the ID of the user whose attributes we want to get.
The same in the GET variety.
A Firestore database for script monitoring. The highlighted example script run ID was a script where a Component Manager client refreshed the list of Adobe Analytics workspaces.

Anyway, let’s see this in action now:

Video: From the request to the response (when I say “pubsub endpoint”, I mean “Cloud Function endpoint” btw. Told you I am still learning…)

Summary

I hope this gave you an idea on how you can use Server-Side Google Tag Manager to enrich client-side data (e.g. from a data layer) by querying a database in GCP in real-time and returning the output to the (server or browser) client. You can of course query any other data in similar fashion, e.g. sensitive product data (e.g. profit margins) or anything else. Thanks for reading!

Digital Analytics Expert in Zurich. Owner of dim28.ch. Older posts are at https://www.webanalyticsworld.net/author/loldenburg

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store