Bayesian Econometrics in Python

August 31, 2021
Michael Taylor

This post is the latest in a series that started with ‘Econometrics in GSheets’. It was created in partnership with Recast, who wrote the code and published a companion article that goes into more depth. In this post we’ll be building the model in a Google Colab notebook using Python, rather than using GSheets, similar to what I did in the post ‘Econometrics in Python’. If you’re not already familiar with Econometrics, I recommend you go back and read the first post.

Bayesian methods are growing in popularity as the more ‘modern’ approach to marketing mix modeling... even though Thomas Bayes was born in 1701, almost 300 years before marketers started using Linear Regression. One superpower Bayesian has over traditional methods is that it lets you make prior assumptions about how the model will work! For example, I’m always seeing models that show that marketing drove negative sales (which makes no sense!)

There’s no conceivable reason to expect all that money you spent on ads would have decreased sales (it’d have to be a pretty awful ad!). It’s likely that the real reason this is happening is that the variables are correlated with each other, messing up the ability of the algorithm to tell them apart. I can tell you, the client won’t trust your model if you tell them their marketing drove negative sales, no matter how ‘accurate’ the model looks.

So I enlisted the help of Michael Kaminsky formerly of the Marketing Science team at Harry’s, who uses Bayesian methods to automate marketing mix modeling at his new startup, Recast. He shared the following code to demonstrate how you can take advantage of Bayesian methods, without much more effort. If you stick with us to the end, we’ll also go under the hood and explain how and why Bayesian methods work.

If you’d like to follow along, here’s the Google Colab Notebook to copy:

Also check out the companion article on the Recast blog which goes into more depth.

How can you do Bayesian regression in Python?

The library we’ll be using is PyMC3, which does all the complex Markov chain Monte Carlo stuff under the hood. Don’t worry if you don’t understand any of this (I didn’t until recently!), bear with us and by the end you should have a bit more intuition for what’s going on. The actual code to run the model looks like this:

This creates a model with a few standard assumptions (normally distributed), and iterates 2000 times to map out all the potential values of the coefficients. Then the next step is really just to take an average of those coefficients to get the final result. This example shows 5 hypothetical marketing channels, with one of them being negative.

What we just did was replicate the same assumptions and results we’d get from Linear Regression. If you take a look in the code we also ran a Linear Regression the normal way, and got essentially the same coefficients. For example x__5 has a negative coefficient of around -0.86 in both models (there will always be slight differences).

Now let’s see what flexibility the Bayesian methods can offer us that Linear Regression can’t. Let’s tell the model we need the marketing coefficients to all be positive. We can do that with just a small change to the model, as you can see below (the highlighted lines).

This just changes our prior assumptions going into the model, capping the range of potential coefficients for marketing at zero (lower=0.0). That means when the model simulation runs, we will get more realistic output for our marketing variables based on our knowledge of how marketing works.

This is looking much better: the negative coefficient has disappeared and some of the numbers have moved around a bit. But how do we know how accurate this is? Well actually because this is fake data that we generated ourselves, we know what the ‘truth’ is!

As you can see, the last model with marketing coefficients bounded to above zero, did a lot better than the normal OLS Linear Regression. It’s still not perfect, and there are plenty more things you could do to improve the model, but we’re substantially closer just by using some common sense and domain expertise, combined with the Bayesian ability to let us incorporate those techniques.

If you’re looking for more information on how this code works, or Bayesian techniques in general, check out the companion post on Recast.

Why should you use Bayesian Regression?

This method is a little more complicated, and can only really be done with code rather than in Excel, so is it worth it? For most people this might be overkill, or at least over their heads. However if you’re spending six figures or more a month on marketing, it’s really important that you get your model right. The great hope of Bayesian techniques is that if you set your model priors sensibly, the actual modeling can be more reliable, even when automated. 

That’s not too important when you do one model a year with an army of statisticians, like most Fortune 500 brands, but for fast-growing startups who change budgets monthly and strategies weekly, annual planning just won’t cut it. You need to be able to automate the modeling process, so that new variables show up as you change tactics, and coefficients change based on the rapidly evolving landscape of your market. This is something that Recast is solving, having seen it’s effectiveness at Harry’s, and I recommend talking to them if interested.

So why weren’t Bayesian techniques adopted before? Well Linear Regression is based on a very efficient mathematical formula, which is relatively cheap to calculate – this was important in the 1908s when it was adopted and a computer the size of your house had less computing power than the phone in your pocket! Sure it requires a few important assumptions, but in most cases those assumptions hold and data can be manipulated to fit them. However with the lower cost of computing power, more resource-intensive methods like Monte-Carlo Markov Chains (what we just used) have become more accessible.

How do Bayesian techniques work?

Say you were on the basketball court, and trying to throw the ball through the hoop. Linear Regression would be the equivalent of measuring the distance to the hoop, it’s elevation of the ground, then using Trigonometry to calculate the perfect throw.

https://theconversation.com/the-math-behind-the-perfect-free-throw-91727

Bayesian sounds complicated, but in practice on the basketball court it’d do what any of us would do: throw the ball lots of times, and adjust until you’re getting it in the basketball hoop more often than not. That’s why it can be resource intensive – it simulates lots of potential outcomes, i.e. throws the ball lots of times, until it has a good understanding of what throw has the highest probability of making the basket. 

To stretch the analogy further, when we bounded our marketing coefficients to not be zero, that’s the equivalent of telling the bayesian basketball thrower that you can’t throw from behind the net. It saves them some time, and eliminates the chances of them finding some interesting way to reliably score from the bleachers. 

The algorithm we used was called ‘Markov Chain Monte Carlo’. The Markov Chain part you can imagine as a notepad, where you write down the sequence of actions you took to throw the ball. This helps you keep track of what precise movements are more likely to score a basket (though in reality you do this in your head). The word ‘Monte Carlo’ refers to the technique of just trying multiple thousand or millions of times with a computer until you find the answer. It was invented by Stanislaw Ulam, one of the scientists behind the Manhattan Project (atomic bomb), when he struggled to derive a function to predict his odds of winning solitaire.

Of course it’s more complicated than that in practice, but this should give you a good idea of what’s going on under the hood. It might seem complicated, but thankfully all of the hard math has been worked out for you, and packed into libraries like PyMC3. In fact, as someone who failed math in high school, one of the really appealing things about Bayesian techniques is that it requires very little math! You don’t have to derive or integrate anything, you simply throw a lot of shots and record what works! If you’re interested in learning more then check out Recast’s companion article, and the book Bayesian Methods for Hackers.