We believe that anyone can learn to do predictive marketing with the right foundation. In our series, “Core Concepts of Predictive Marketing”, Acquia’s Chief Science Officer, Omer Artun shares excerpts from his book: “Predictive Marketing: Easy Ways Every Marketer Can Use Customer Analytics and Big Data.”
This series will be a guide to everything you need to know about relationship marketing and predictive analytics in marketing. Dive in to learn how to activate your customer data and tap into unlimited opportunity.
Recently, we looked at examples of predictive marketing use cases, and ways marketers can use predictive analytics to understand customer loyalty and lifetime value. Now that we’ve explored the insights these techniques can generate, let’s walk through the different steps a data scientist, or analytics software, goes through to make accurate predictions or recommendations.
Figure 1 gives an overview about what is happening under the hood, either of out-of-the-box predictive analytics software, or the steps your internal data scientists will have to go through if you are building your own predictive analytics models. There is a lot that comes with developing and deploying predictive algorithms for marketing and if you are starting new and want our advice, we strongly recommend that you use an off-the-shelf software package suitable for your industry to take care of the steps described automatically. When you exhaust off-the-shelf models, and you have the budget and need for a data scientist, you can easily evaluate the cost-benefit equation with the experience you gained (and you’ll find it easier to convince the CFO to make the incremental investment).
Figure 1: Overview of the Predictive Analytics Process
Data Collection, Cleansing and Preparation
Data cleansing and preparation is the most important and most ignored stage in predictive analytics. In some cases, the data might be missing or incorrect as collected. Data cleansing is used to correct things like names and addresses to make sure the computer will know that a customer lives in California when her state is listed as CA. ( You can learn more about the processes used to unify and standardize individual customer profiles in our e-book: Identity Resolution 101).
However, even after building 360 customer profiles, there still is significant work to be done getting your data ready for analysis. Not all data collected is immediately usable and the results could be skewed by missing data or outliers, data measurements that are either too low, too high, or do not fit the underlying data-generating system.
If you are considering building your own predictive analytics capabilities, make sure you address who will do the data collection, integration, cleansing and preparation. Chances are, your typical data scientist is not going to be satisfied doing this work and is going to expect that you hire a separate software or integration engineer to do this work.
Outlier detection often makes a big difference in the accuracy of predictive models. For example, if a customer at an electronics retailer came in and bought 50 televisions for $50,000 when the average customer at the retailer spends $500, this high-spender will skew the average order value metric. In electronics retailing, these types of outliers, where there are few users making large purchases, are indeed quite common. People making such large purchases could be middlemen who are buying items like televisions to take out of the country and resell. These are not normal consumer customers, but rather gray market resellers. If this situation wasn’t recognized and corrected, the retailer would think these are great VIP customers. Not recognizing this creates two problems: distorting the definition of VIP customers so the true VIP customers would be left out and masking an opportunity to market to this group of resellers in a more profitable way.
To correct for the outlier, your data analyst or predictive marketing software will need to detect and either remove the outlier or replace it with a number at the high end of the distribution (e.g., the lowest spend of the top 10 percent customers is $2,400, so replace the $50,000 with $2,400). This replacement is only done for modeling purposes. Alternatively, you can treat these customers as a separate group altogether and create specialty programs for this one segment.
In another example, one retailer was measuring foot traffic at each store but would miss out on data for certain days whenever the measuring device would get knocked off by the cleaning crew. To correct for the missing data, the retailer applied an imputation based on the three-week average for the same days in the week as the missing days.
Imputation is the art and science of replacing wrong or missing information. Depending on the specific data elements, there are various techniques for this:
- Replace with static or temporal averages.
- Model the data based on other variables available. For example, you can model a vitamin store customer’s age based on whether she buys vitamins geared toward women above age 50.
- Random selection from the underlying distribution. For example, if the foot traffic data is missing and this data usually follows a bell curve, then randomly generate a number from the underlying distribution.
Imputation is a great way to make up for missing data until the problem is corrected at the source. Another example of imputation is asking customers for their birthdays. This is a great piece of information for modeling and action purposes but not all customers want to provide this information. In such cases, the predictive model would either discard the birthday as an input or discard customers with no birthday.
Feature Generation and Extraction
Once your data scientist or predictive marketing software has cleansed the data for missing information and outliers, there are two other factors to consider: (1) the data may be too large to use as is, or (2) data in its current representation may not be suitable for the models. Feature generation and extraction deals with turning data into information that the models can digest and throwing away unnecessary or redundant information.
Think of feature generation and extraction as separating the signal from the noise. Feature extraction deals with removing unnecessary information by either throwing it away or transforming it to eliminate the noise. There are quite a few mathematical methods to use but the short explanation is to utilize algorithms to be able to extract the maximum amount of information from the data, regardless of what you will use it for later on. This optimal extraction leads to less noisy data, hence increasing the accuracy of predictive analytics.
There are tricks you can use to make your data easier to use. For example, when trying to analyze the number of orders from a customer you can look at the numbers in absolute terms or you could take the logarithm of the number, creating a new variable where if a customer has 1 order or 10 orders, it’s the same difference between having 10 orders and 100 orders. It is a simple transformation that can have a powerful impact.
Another example could be taking the ratio of certain variables instead of using absolute numbers. For instance, instead of return revenue and shipped revenue per customer, you can calculate the ratio or percentage of revenue that comes through returns.
Classifier and System Design
The next stage in the process used by data scientists or predictive marketing software is choosing, architecting and fine-tuning the correct algorithm. In machine learning, there are two important concepts that need to be understood.
One is the “no free lunch” theorem, which states that there is no inherently better algorithm for all problems out there. This is important to understand so the data scientist chooses the right algorithm for the right problem, and not use the same algorithm for every problem.
The other concept is called the “bias-variance dilemma,” which states that if you go deep in developing an approach and algorithm to solve a specific problem, then the system that is biased toward this specific problem gets worse and worse performance in solving “other” problems out there. The lesson learned here is to understand that no algorithm is inherently better than the other. If you develop your own algorithms it means you probably have to develop multiple algorithms for multiple situations. If you buy off-the-shelf predictive marketing software, you want to make sure to choose a vendor that focuses on your specific vertical and business problems, and/or to choose a vendor that has self-learning algorithms that can adjust to your specific situation automatically. Correctly architected software solutions usually have multiple models competing with each other and the “champion” is selected against “challengers” that are unique to the customer’s domain of data. This maximizes performance and eliminates the need for hand-coded custom models.
Just writing an algorithm is not enough when it comes to predictive analytics. Before you can start to use an algorithm, you need to backtest that it actually works. If you use a predictive marketing software package, your vendor will have done this for you already. However, if you are developing your own predictive analytics models in-house you will have to worry about training, testing and validating your models before you can start to use them. The time needed to develop predictive algorithms can be divided into 80% training, 10% testing and 10% validation. This means that after writing the algorithm, data scientists need to spend considerable time training and testing the algorithm to make sure it works accurately.
For example, when developing a likelihood to buy model, if 1% of 10 million customers buy within the next 30 days, then for training, we use 100,000 customers who bought in the past month and randomly select 100,000 customers who didn’t buy anything in the past month, so that the total dataset has 200,000 customers, in which 50% bought and 50% didn’t. This oversampling produces better results, because it focuses the model to detect between potential buyers and non-buyers.
The Last Mile Problem of Predictive Analytics
Most data scientists don’t worry how marketers will use their predictions. Frankly most data scientists don’t know enough about marketing and marketing systems to embed predictions into the daily routine of marketers. An email marketer at a large national department store once told us: “Brides register with us on our website and leave a lot of personal information. That information is in our customer data warehouse somewhere and we probably even analyze it. However, as an email-marketing manager, I am unable to run a simple campaign that takes into account some of the preferences or dates that the bride has shared with us.” We call this the last mile problem of predictive analytics.
Especially in organizations with in-house data scientists, the outcomes from predictive models are often not easily digestible or usable by marketers. It is often very difficult for marketers to put predictive analytics into action — to connect the dots from analytics to the daily campaign management of email, web, social, mobile, direct mail, store marketing and customer interactions in the call center.
For customer predictions to be profitable, predictions need to be put in the hands of all the customer-facing personnel in your organization. If you can’t surface recommendations to the personnel in your call center, the upsell might never happen. If you can’t use likelihood to buy segments to decide whether to send an abandoned cart holder a discount or a reminder, you are leaving a lot of profit on the table.
Now that you have a basic understanding of predictive models and how they can be used for marketing, our next blog post in this series will cover the most important questions marketers need to act when building a successful data-driven marketing strategy.