Use Google Tag Manager Server-Side for Visitor Stitching by Querying a GCP Database in Real-Time
How to use Server-Side GTM + Pubsub + Python + Firestore to get attributes from a database about a User, Products, or anything
Recently I have found some time to finally get going (thx to Simo Ahava’s TeamSimmer) with Google’s late, but intriguing arrival to the Server-Side Tag Management Zoo (SSTMZLOL). Tealium, Signal (no, not the messenger) and probably some others have had their cages there for almost a decade now. Adobe put its own elephant into that zoo last year: Launch server-side properties which leverage the Adobe Experience Platform Web SDK. So no more social distancing there. The battle is on.
Even though the offers by Tealium, Adobe and Google are quite different, their most popular use cases have employed the browser or mobile app variety of the Client-to-Server-to-Vendor (CSV) model, so in GTM’s terminology we could call it the “Browser/App Client -> Server Client -> Vendor Endpoint” model. One reason for the browser still being the focal point — even though we thought we were in server land now — is that the raw materials for such browser-initiated integrations are right at our fingertips: Client-side Tag Management Systems and data layers are well-established. Now we just take part of that stuff server-side.
(Too) much of Server-Side Tagging revolves around circumventing ad blockers & ITP
Another reason is that Server-Side Tag Managers were introduced to solve typical “problems” of client-side tracking. So most applications of Google Tag Manager Server-Side (GTM SS) have circled around things like:
- reducing the bloat of client-side “pixels” by implementing them server-side and consolidating data streams (should become anybody’s aim)
- making cookies circumvent Apple’s “Intelligent Tracking Prevention” (ITP)
- getting all that user data to Facebook via the Facebook Conversions API after all no matter the browser or iOS app restrictions
- hiding the tracking from users and ad blockers altogether. And even though Simo Ahava, while showing how to do that, keeps repeating that this is not why we should use Server-Side tagging, it is what I thought would happen.
The irony, of course, is that such intransparent tracking methods (compared to traditional client-side methods) are the result of the spread of well-intentioned tracking prevention technologies and legal frameworks like GDPR (which envisioned more transparency). And what makes it even more ironic and unfair is that only organizations with a certain level of expertise and money can afford server-side tracking.
GTM SS sits in the GCP, but is (so far) barely integrated with GCP
Anyway, in all this frenzy of “how can we track everything and more after all with the usual lip-service consent management”, I was missing use cases that leveraged the Google Cloud Platform (GCP). After all, SS GTM is built on top of GCP’s App Engine, and I have been using GCP for quite some things these last two years (automatically scheduled Python QA & ETL scripts for various data pipes, or the Adobe Analytics Bulk Component Manager for Google Sheets). I only remember one case (you know I write this only so you can spam me with more): Simo showed us how you can write to BigQuery directly from GTM SS.
[UPDATE: Mark Edmondson (thx!) “spammed” me indeed with an example using Pub/Sub via the logging for Webhooks and one using BigQuery, Cloud Run and more where he ends up building his own Analytics tool (ok, POC-style, but impressive).]
The main reason for this scarcity of GCP-GTM use cases may be simple: GTM SS has almost no APIs to other GCP components yet. Writing to BigQuery seems to be the only one. You cannot even read from BigQuery. Long term, such GCP-GTM integrations will hopefully become the biggest selling point of GTM SS.
The use case: Visitor Stitching
So with almost no GCP APIs at our disposal, let’s try with a custom approach! For this, we will first look at a concept of another long-time zoo member, Tealium’s Tealium’s Audience Stream (part of its server-side “Customer Data Hub”). One thing that has always intrigued me was how it does Visitor Stitching in real-time:
- You send in any Visitor ID (e.g. you may have only the user’s email hash at one stop of her customer journey (e.g. in the checkout), but her account ID or hashed phone number at another stop).
- Audience Stream then “revives” all attributes for the Visitor profile that this ID belongs to (e.g. think of attributes like gender, number of purchases, device types used, number of visits, last Net Promoter Score, favorite brand etc.).
- The now available “stitched” visitor profile can then be sent in real-time to vendor endpoints via “connectors” (e.g. Salesforce Marketing Cloud). There it can be used e.g. for automated email/SMS marketing journeys. Audience Stream can even send the stitched attributes back into the browser via “Data Layer Enrichment”. This allows you to bring data that was calculated and stored server-side into the browser and then send it to any client-side tag (e.g. Google Analytics) (see some examples).
So why not try a featherweight version of visitor stitching with GTM SS?
Disclaimer: This article is an exercise for myself. I am still learning GTM SS, I still feel like a loser in the GCP, and my Python is clumsy. Thus, my solution may leave a lot to be desired. The main focus should be on the mechanics, because I think similar approaches (especially with PubSub and Google Cloud Functions) can be useful for a lot of cases. Thank you for constructive suggestions!
- Optional: To make things easier, a GTM Web Container or any other client-side TMS (Tealium, Launch etc.) to trigger the initial stitching request to GTM SS.
- A basic GTM SS Setup with your own custom domain endpoint (myGtmSubdomain.myDomain.com) configured.
- A Custom “Stitcher” Client in GTM SS to pick up the stitching request and forward it to GCP in the form of an HTTP request to a …
- … Google Cloud Function (the “Stitching Web Service” so to say), which is basically a Python script. Cloud Functions are the olympic racing dogs in the GCP. They can run no longer than nine minutes and have limited memory, but they start sprinting right away (no wait for scaling up etc.) and are very cheap.
- If the request wants a synchronous response (The GTM Server client says: “I won’t do anything until you return sth to me”), the Cloud Function directly queries …
- GCP’s Cloud Firestore, a schema-flexible (compared to the very strict BigQuery) document database. Here we have stored the user attributes, with every user representing a single document (think: a row) — see a screenshot further below. What I like about Firestore is that it is super-fast, and you can query and work with the data just like with normal dictionaries (the Python equivalent of JS objects), so it is built for applications. Moreover it is cheap: 50'000 reads (=more or less queries to a single document) per day are free. If you prefer some other GCP storage mechanism, maybe you can extend this idea to e.g. a Python script that reads from BigQuery.
- If the request desires an asynchronous response (GTM’s Server Client says “I will not wait for you and do other things in the meantime, just ping me when you are done.”), the Cloud Function triggers a message to PubSub, GCP’s messaging service. When PubSub receives the message, it triggers another cloud function which does the query to Firestore and then returns the result to the GTM Server Container via an HTTP request to GTM’s endpoint URL.
The example works as follows:
I. I send a GET request from the browser (or from anywhere else) to my GTM SS endpoint with some parameters, one of them being the user ID for which I want to get attributes. Since this is a demo, I just do it with Fiddler’s Request Composer and leave the web container untouched (I assume you know how to create a request via JS).
Note that I actually do not query a real user database — because I don’t have one on my GCP :). Instead, I query my script monitoring database where I store the run times, durations and outcomes of all those regularly scheduled scripts which I run in the GCP (mostly via Cloud Scheduler). This helps me monitor slow scripts, failures and answers simple questions like “did this import run today already?”. So in my example, the User ID is actually a “Script Run ID”. This is what the database looks like in Firestore:
II. The Custom “Stitcher” Client in GTM SS picks up that request, parses it and forwards it as an “outgoing” request to a Google Cloud Function, together with a secret token in the data payload. This is a good case for Server-Side Tagging because both my Cloud Function endpoint and the token shall not be known to outsiders. We submit all values as strings because PubSub (which we need for the asynch part) accepts only byte strings for data and string values for attribute maps (the “attrs” property).
III. Now we have left GTM: The Cloud Function runs. The following Python script is that Cloud Function. It checks for the token and whether the request comes from an allowed GTM container. Then, depending on the
async_response parameter, it either gets and returns the visitor attributes from Firestore right away (synchronously) or submits yet another PubSub message to do that asynchronously. The asynchronous method is the way for tasks in the cloud that take longer, so you don’t want the GTM server client to wait until they are done. When they are done, they simply return the result to the GTM SS container (via a HTTPS POST request to the GTM SS tagging endpoint URL). GTM SS can then cache them via the templateDataStorage API to make them available for ensuing requests.
Now how do you set all this up? Well, I am cheating a bit by saying that my focus here is the GTM part. Explaining the GCP setup would be a whole e-book of its own. Also, most GCP setups are pretty different, so my case may only apply to a few of you. But if you have a decent GCP admin who knows a bit about cloud functions and Pubsub, she should be able to get you going. Another reason is that I am the wrong person to explain GCP stuff. I did build this GTM example myself, but it uses a lot of templates in my existing GCP infrastructure for which the credit goes to Yurii Serhiichuk.
Anyway, let’s see this in action now:
Anyway, what do we do with this response now? Well, you could fire a (non-interaction) event to Google Analytics from your browser or from the server client to add the user attributes as user-scoped custom dimensions or GA4 user properties. Or you could use them for targeting users with personalized messages based on certain attributes (e.g. via Adobe Target or Google Optimize). Or you could even not return the response to the (browser) client, but instead directly fire that GA Event Tag from a GTM Server tag. This way, no user data needs to be exposed to the browser! Your imagination is the limit (isn’t that cheesy?)!
I hope this gave you an idea on how you can use Server-Side Google Tag Manager to enrich client-side data (e.g. from a data layer) by querying a database in GCP in real-time and returning the output to the (server or browser) client. You can of course query any other data in similar fashion, e.g. sensitive product data (e.g. profit margins) or anything else. Thanks for reading!