Data-Driven Blunders and how to Avoid Them
“We are a data-driven company”. You’ll travel far and wide before you find a tech startup that doesn’t pride itself on this claim. It’s become a tech staple – alongside free beer on Fridays, table football, and all the fruit you could dream of. And, while the logic behind a data-driven approach is undeniable, too often the expectations that come with it aren’t met. This data-driven approach permeates events, dashboards, metrics, and reports, and leaves most of us feeling less like Neo at the end of The Matrix and more like a dog whose owner just hid a tennis ball after pretending to throw it – confused, our excitement transmuted into frustration so deep we feel like chewing on our favorite plush toy.
Let’s be clear, good tracking and hypotheses validation with data is essential for any product manager. The problem arises when we expect data to be the “secret sauce” that will immediately improve all aspects of our product, and that the answer to every question is always more (events, dashboards, tests). This borderline zealous belief in “data” as the answer to all our prayers is dangerous for many reasons.
For starters, any system (for example all our fancy algorithms, events, tests…) can’t accurately predict the outcomes of a more complex system than itself (that of us humans deciding what to buy and how). This is a principle of the universe, so there is little we can do other than attach a healthy amount of scepticism to our test results, and repeat them when possible.
On the bright side, sometimes we aren’t facing complexity problems, simply flawed practices. Thankfully there’s plenty we can do to identify and tackle these. There are probably as many flawed data practices as there are product managers in the world, but in my experience they can be roughly grouped into three categories.
Have you ever run an A/B test and ended up with more questions than answers? More often than not, I see this happening as a byproduct of the sheer amount of variables that are involved in human interactions. The more variables, the more data you need for a statistically significant result. A simple test to determine if colour A or colour B has a higher conversion rate on a call to action (CTA) requires thousands of impressions to yield consistent results. Any added variables (shape of the CTA, copy, position) multiplies the number of impressions you’ll need, without mentioning other metrics that might be affected: What if sign-up goes down even though conversion goes up for that particular CTA? Or conversion drops slightly for the other CTA but the size of the basket doubles? Adding one single extra layer of complexity to a test systematically raises the number of necessary impressions for statistical significance.
So we need to be extremely precise on what we are testing and the associated impacts we can foresee: If we suspect (and we should) that our CTA test might impact not just conversion, but also basket size and retention, we should include those metrics in our test reporting. We also need to create our test in a way that, should there be a clear winner, we are able to say with certainty what element exactly had the impact. If A is red, with rounded corners, and has different copy from B, which is blue and has square corners, you might get a clear winner, but you will have a hard time determining which of those three different elements delivered the impact.
Keep it simple and test one variable at a time. You will have to run multiple tests or create more than two variables, which might feel like a waste of valuable time, but compared to getting consistently unactionable results or worse, false positives, you’ll be happy you put in the extra effort. If the test you want to run requires too many variables then you’re probably better off doing some qualitative testing first (focus groups, user interviews, guerilla testing…) and shaving off the excess variables before setting up an A/B test.
Be sure, also, to measure not simply the metric that you’re aiming to move, but also those that you think might be affected. But be careful to not overdo it, otherwise you might fall into the second deep data hole we all want to avoid.
Too Much Data, too Often
Imagine you are trying to lose a few kilos. You set yourself realistic goals and provide the means to reach them. You start to eat healthier, exercise at least three times a week, and vow to cut down on those midnight grilled-cheese sandwiches.
You could stay true to your new habits and only check the scales every week, giving yourself and your body enough time to adapt and start shedding weight consistently, and only making any changes after you’re sure of the impact.
Or, you could start weighing yourself multiple times a day, overreacting to even the slightest weight gain. You try to compensate by skipping meals, going out to run multiple times a day, only to then become too exhausted to do any sport for the rest of the week and binge eat snacks every time you skip a meal. By the end of the month you will weigh exactly the same as on day one, and you’ll be extremely frustrated at yourself for the wasted sacrifice.
Just like diets, product improvements, often take time to have any visible effects. If you release a product or feature and start looking at the KPIs every waking hour, changing things on the fly when the KPI doesn’t immediately move in a positive direction, you will be acting against background noise and ruining any results by course-correcting against it. Oscillation of metrics (within reasonable margins) is perfectly normal and expected; there are lots of external factors that can have a small influence on any given KPI: weather, day of the week, bank holidays, available stock, website outages, discounts from the competition, seasonality… Furthermore, users might need some time to adjust to the feature or learn how to use it, thus delaying the visibility of its impact.
Back to our CTA example: if conversion drops on the first day of testing, and you immediately add more variables to the test in an attempt to “fix” it, you will not only forfeit any measurable results for the test, you might be killing a change that would have made a positive impact later on. Sometimes users have a weird day, sometimes they need time to get used to change, sometimes you just get unlucky on the first few days and get the “bad” users on your test. Wait it out, check the data only when you might learn something from it, and check it at the level of granularity that makes the most sense, often weekly (if not even monthly) to prevent daily fluctuations from throwing you off. Very few metrics will be consistent on a daily basis and paying attention to background noise fluctuations will only be a waste of effort.
Picking the Wrong North Star Metric
A North Star metric is the single KPI that works best as a proxy for success for your product and business. North Star metrics are popular because they avoid the problem of having too many data points to measure. A single KPI that you know gives you an indication of success means you don’t need to dig in deeper into other KPIs. As long as that North Star is pointing in the right direction, you will save time and cognitive load for you and your team.
But North Star metrics are not easy to set. Products are complex and multifaceted and it’s hard to stick to a single overarching metric. On top of that, truly good North Star metrics are often lagging, and that interferes with their purpose of serving as a simplified proxy to understand daily if your business is going well or not. So, while a metric like Customer Lifetime Value might be a much better North Star than Conversion to Order for example, many companies and PMs tend to take the latter: conversion is easy to grasp, it’s a good enough proxy for success, and it’s real-time. Conversely, Customer Lifetime Value can be very complex and require months before being set, but is a far better proxy for success than conversion.
Let’s be clear, there’s no perfect North Star metric: Any single metric will be unable to grasp all the ins and outs of even the most simple of businesses. But there are better and worse North Star metrics. To start, the metric should be somehow paired with the value you bring to your customers. If you’re bringing true value to users then all other metrics will fall into place. Additionally, your North Star shouldn’t be easily “hacked”: if you can artificially boost a metric then it’s not as good a proxy of success as you thought. There are a few ways of boosting conversion while having a horrible impact on the business: fill your website with dark patterns, make every button a CTA to confirm your purchase, give discounts to customers to the point of making them unprofitable, halt your acquisition ads and let only organic users reach your website, sign your customers automatically (and inconspicuously) to a monthly subscription… All these initiatives would earn you a bonus if attached to raising conversion, but they would also probably bring your company down and you with it. While there may be ways to raise the Customer Lifetime Value which have negative side effects, they’re probably less numerous than Conversion to Order.
As an alternative to using your North Star as the single go-to daily metric, try to select a few “health” metrics within your product that you check daily. Remember than slight fluctuations are normal and that other departments might be running initiatives (like offering discounts or reducing acquisition) that influence them. Keep your North Star as a proxy for the long-term health of your product, not as an indicator of the current status; As with our diet example, your North Star could be your weight: it’s a good indicator of your general success but it takes time to move and is influenced by many factors; While the amount of exercise you’ve done that week along with how healthy you’ve eating would be the daily KPIs you can check and course correct.
We live in a time of abundance of data and data processing capacity, but even if we can add an event to every single action in our app or website, and create a dashboard for every single KPI in our business, we are still limited by our own cognitive capacity.
With data, quality is even more important than quantity. You only have the capacity to process and draw conclusions from a finite amount of data, so make sure it’s the bit that will help you improve your product and get a better understanding of your users, not the bit that will make you feel like your human made the tennis ball vanish mid-throw one more time.