3 Lessons I Learned in Acquisition Cohort Analysis Using Shopify Data

Carlos Abiera
4 min readJul 14, 2021

A cohort study in e-commerce is a borrowed discipline in medical research to identify and quantify the relationship between exposure and outcome. In e-commerce, it is the interest of quantifying the audience behavior after being exposed to Facebook, Google, or Email campaigns.

What are the purchase habit of the returning customer after targeting them with specific campaigns? What campaigns work better? what’s not? How frequent is the customer coming back after the first purchase? when did they start to lose interest? how many days before they came back for another purchase? These are some of the questions that can be answered when performing Cohort Analysis.

I’ll be using the work of Prachi Gopalani and Greg Reda to show the lessons I’ve learned in applying acquisition cohorts. I want to know the stickiness of the customer's overtime from their acquisition month up to the period they cease to purchase using Shopify data.

I’ll be using two attributes: customer_id (individual customer) and day (period of purchase) to walk you through the 3 things I learned in performing acquisition Cohort Analysis using Shopify data.

Figure 1: Shopify Report Columns

1. Adding Constraint to your cohort

Your cohorts must be time-bounded. If you are interested in understanding how a group of customers behave over time, for example, in the period of 12 months, remove all customers acquired before that period. This will give you a better grasp to know how individual customers are coming back.

Figure 2: Removing all customers acquired before target cohort

2. Understanding the Axes: CohortGroup and CohortPeriod

Figure 3: Cohort Overview

The CohortGroup is the group of customers we are trying to examine. It is represented in a calendar month. The CohortPeriod is the sustaining relationship we have with the customers measured by month.

CohortPeriod 0 is the month we acquire the customers. Your first-time customer count should match the number in this column.

  • In the CohortGroup 2019–02, we have acquired 219 new customers. The following month which is CohortPeriod 1, 7 of the new customers returned. In CohortPeriod 2 on the same CohortGroup, 14 out of 219 customers returned.
  • In CohortGroup 2019–03, we have acquired 87 new customers, the following month which is CohortPeriod 1, 10 of them returned. In CohortPeriod 2 on the same CohortGroup, 5 out of 87 customers returned.

So, in February 2019, we have overall 232 unique customers: 219 First-time and 13 Returning customers. How come? it is because the CohortPeriod 1 of CohortGroup 2019–01 in February. This means that out of 78 new customers from this cohort, 13 came back after a month.

But this is not always the case, there are times that a first-time customer will return after 2 or 3 months, but still, it will fall on the same CohortPeriod 1 column because it is still the customer's first month.

These instances are common when you are working with large cohorts or when the company is not established enough to generate more returning customers.

3. Time Between Purchase

It is interesting that from this cohort, we can also extract when a customer coming back. Computing the time between a purchase from CohortPeriod 0 to 1 or more can unlock more insights. This will help the decision-maker to tailor their marketing campaigns in getting back more customers.

What I did is first, get the mean of all purchase date from each CohortPeriod. I converted the date data type to int64. This is how it looks like when you’re able to get the mean of date for each cohort.

Figure 4: Cohort with the Date of purchase Mean

From here we can pull the cohort 0 to 2 to show the time between a purchase from the acquisition date to 1st month then 2nd month of purchase to the third month.

  • first_purchase is our first touch with the customer
  • second_purchase is when they returned after their first purchase
  • third_purchase is when they returned after their second purchase
Figure 5: Mean of the date ranges in int64 form

then we can convert each column back to date format and subtract to get the difference.

Figure 6: Columns in date format and the time between purchase

This is just a snapshot of my journey in digging deeper into Ecommerce data. This is one of my favorite tools to analyze the effectiveness of the company’s efforts in acquiring new customers and their retention. I hope these insights helped you in your data science journey.

--

--

Carlos Abiera

Carlos C. Abiera currently manages the operations of Montani Int. Inc. and leads the REV365 data team. He has keen interests in data and behavioral sciences.