
“I know that Direct Mails get higher response rates than FSI’s in the newspapers but can I justify the higher costs of DM?” or “if we launch our new signature product, how do I know it will work and outperform our current product, especially at a steep 20% discount?”
The former question is one that we hear often from our clients. The latter question was probably uttered by Jeff Moody, the CEO of Subway, before he decided to launch their foot-long sandwiches at $5 to a resounding success of $3.8 billion in sales and catapulted Subway past Wendy’s and Burger King in terms of market shares. In fact, the success of the “$5 FOOTLONGS” campaign has positioned Subway to overtake the market leader, McDonald’s restaurants, in terms of the number of stores worldwide in early 2010!
Picking the next “$5 FOOTLONGS”
How does one answer the above questions, but more importantly, how can you be sure that you have found something akin to the “$5 FOOTLONGS” and able to convince your boss to launch it before your competitors get to it. The answer to this question lies in using a data and analytics driven “Test and Learn” methodology, commonly known as Design of Experiments (DOE).
In essence, DOE has been used for a long time in scientific and medical research to establish the incremental effect caused by a particular “treatment” while eliminating the effects from any other factors. So in the case of Subway, how would Jeff Moody determine the increased traffic observed by the trial store was caused solely by the $5 price and wasn’t due to any other mitigating factors? For example, local conditions specific to Miami or the way the store owner, Stuart Frankel, managed his stores and/or the promotion?
To answer this question, one should first address the question of whether the increase in traffic was caused by the promotion at all i.e., the question of “causality” and then isolate the effect of the price promotion from the other factors that may also impact Mr. Frankel’s Subway sales.
Ensuring Causality
It is a well known fact that simple correlation does not prove causality. When I was very young, growing up in rural Borneo, I rather fancifully imagined that since I woke up every morning and the sun came up after I awoke and then went down after I went to bed, that I had “caused” the sun to rise and set!
To prove my youthful conjecture, I need more than observed correlation. I need to make sure:
- Time-Order. The cause must happen before the effect (my waking and going to bed must always precede the sun’s rising and setting)
- Rationale. There must be a logical and compelling explanation of the cause (in this case there is none except my youthful hubris)
- Non-Spuriousness. The effect must be caused by the changes in the cause and every other possible factor must be eliminated (see below).
The first two conditions are relatively easy to satisfy but are not sufficient to prove causality. That is why we need the DOE to make sure the third condition is met. In my example above, to prove the third condition, I should stop waking up or go to bed so early and see if the sun would rise and set according to my changed schedule. And obviously, the sun did not stop rising just because I stopped getting up and vice versa and hence I “learned” that I did not cause the sun to rise or set!
Checklist for a successful DOE
Similarly, in my experience, most tests fail because of erroneous conclusions reached through poorly constructed control groups that failed to meet the above conditions. I have found, in my many years of doing analytics that executing DOE properly, one must make sure the below checklist is followed:
- Matching: The treatment and hold outs must be matched ensuring that it is truly an “apples-to-apples” comparison.
- Similarity: Both groups must show similar behavior before the treatment starts.
- Randomness: The control group must be randomly selected before the treatment is launched.
- Incrementality: The effects of the other factors must be isolated in an A/B (full factorial) or a fractional factorial test schema.
- Maturity: There must be periodic observations done to ensure the effects of the various factors have been allowed to develop in full.
In the case of Subway, they conducted a proven DOE before their March 23, 2008 launch of the “$5 FOOTLONGS” campaign across the US and the rest, as they say, is history!
In Subway’s case, they were able to accurately test the effect of the “$5 Footlong” promotion prior to launching the campaign nationwide. However, we don’t always have this luxury. What can be done if you either 1. cannot hold out a random control sample or 2. if you need to compute the treatment effect after the test had been carried out and, hence, no prior hold-out sample is available? I will address these types of challenges in my next blog post “How to know it works, Part II – Measuring causality & incremental ROI’s in the absence of a random control sample.“

January 28th, 2010 at 3:52 pm
Well said. I find the incrementality the hardest part. It is hard for a Client, while committed to the idea of testing the outcomes, to hold back on what everyone believes to be a great plan and risk revenue for the sake of the test.
I often find myself managing this by using multiple controls to establish baselines thus allowing me to identify the incremental impact. Sometimes if I segment the control groups I can identify the other factors I would have tested and compare performance among controls in light of the true test (marketing target group).
Way off base or a reasonable compromise? Welcome your thoughts.
Chris
@portma
February 2nd, 2010 at 3:49 pm
I agree that to achieve incrementality by holding a representative random sample from the “treatment” is hard to sell to the business stakeholders and sometimes impossible to justify the opportunity cost due to the size of the revenue risk as you have rightly pointed out!
This however should not stop one from assessing the incremental impacts of marketing “treatment”. It is, in fact a great segue way to my next blog on the use of Propensity Score Matching (PSM) to assess the incremental impacts when no random hold out sample is available or even possible. Rather than trying to remove the sample biases by creating different control groups, PSM allows one to remove the biases by using a single propensity score. The PSM method is still relatively novel to marketers but is gaining popularity for the very reasons that you have alluded to!