3/15/19 3/15/19

Key Ingredients to Being Data Driven

Companies love to exclaim "we're data driven". There are obvious benefits to being a data driven organization, and everyone nowadays has more data than they can shake a stick at. But what exactly does an organization need to be "data driven"?

Just because you have a ton of data, and you've hired people to analyze it or build models, does that make you data driven? No. That's not enough.

Although we think a lot about data and how to use it. Being data driven needs to be a priority at the executive level and become part of the culture of the organization; more so than simply having a team with the necessary capabilities.

Here are the baseline qualities that I believe are necessary to be effective in your "data driven-ness". Now I'm making up words.

To be data driven:

Test design and analysis is owned by analytics/data science teams.
Dashboards are already in place that give stakeholders self-serve access to key metrics. (Otherwise you'll have low value ad-hoc requests to pull these metrics, and it'll be a time sink.)
Analytics/Data Science teams collaborate with the business to understand the problem and devise an appropriate methodology.
Data governance and consistent usage of data definitions across departments/the organization.
You have a data strategy.

You'll notice that there is a lack of fancy hype buzzwords above. You don't need to be "leveraging AI" or calling things AI that are in fact hypothesis tests, business logic, or simple regression.

I don’t believe fancy models are required to consider yourself data driven. A number of the points listed above are references to the attitudes of the organization and how they partner and collaborate with analytics and data science teams . I love building models as much as the next data scientist, but you can't build next level intelligence on a non-existent foundation.

To clarify, I'm not saying every decision in the organization needs to be driven by data to be data driven. In particular, if you're going to make a strategic decision regardless of the results of a test or analysis, then you should skip doing that test. I'm a big advocate of only allocating the resources to a project if you're actually going to USE the results to inform the decision.

Let's take a look at the points from above.

Test design and analysis is owned by analytics/data science teams:

Although data science and analytics teams often come up with fantastic ideas for testing. There are also many ideas that come out of a department that is not in analytics. For instance, in eCommerce the marketing team will have many ideas for new offers. The site team may want to test a change to the UI. This sometimes gets communicated to the data teams as "we'd like to test "this thing, this way". And although these non analytics teams have tremendous skill in marketing and site design, and understand the power of an A/B test; they often do not understand the different trade-offs between effect size, sample size, solid test design, etc.

I've been in the situation more than once at more than one company where I'm told "we understand your concerns, but we're going to do it our way anyways." And this is their call to make, since in these instances those departments have technically "owned" test design. However, the data resulting from these tests is often not able to be analyzed. So although we did it their way, the ending result did not answer any questions. Time was wasted.

Dashboarding is in place:

This is a true foundational step. So much time is wasted if you have analysts pulling the same numbers every month manually, or on an ad-hoc basis. This information can be automated, stakeholders can be given a tour of the dashboards, and then you won't be receiving questions like "what does attrition look like month over month by acquisition channel?" It's in the dashboard and stakeholders can look at it themselves. The time saved can be allocated to diving deep into much more interesting and though provoking questions rather than pulling simple KPIs.

Analytics/Data Science teams collaborate with the business on defining the problems:

This relationship takes work, because it is a relationship. Senior leaders need to make it clear that a data-driven approach is a priority for this to work. In addition, analytics often needs to invite themselves to meetings that they weren't originally invited to. Analytics needs to be asking the right questions and guiding analysis in the right direction to earn this seat at the table. No relationship builds over night, but this is a win-win for everyone. Nothing is more frustrating than pulling data when you're not sure what problem the business is trying to solve. It's Pandoras Box. You pull the data they asked for, it doesn't answer the question, so the business asks you to pull them more data. Stop. Sit down, discuss the problem, and let the business know that you're here to help.

Data governance and consistent usage of data definitions across departments/the organization:

This one may require a huge overhaul of how things are currently being calculated. The channel team, the product team, the site team, other teams, they may all be calculating things differently if the business hasn't communicated an accepted definition. These definitions aren't necessarily determined by analytics themselves, they're agreed upon. For an established business that has done a lot of growing but not as much governance can feel the pain of trying to wrangle everyone into using consistent definitions. But if two people try to do the same analysis and come up with different numbers you've got problems. This is again a foundation that is required for you to be able to move forward and work on cooler higher-value projects, but can't if you're spending your time reconciling numbers between teams.

You have a data strategy:

This data strategy is going to be driven by the business strategy. The strategy is going to have goals and be measurable. The analyses you plan for has a strong use case. People don't just come out of the woodwork asking for analysis that doesn't align to the larger priorities of the business. Things like "do we optimize our ad spend or try to tackle our retention problem first?" comes down to expected dollars for the business. Analytics doesn't get side-tracked answering lower value questions when they should be working on the problems that will save the business the most money.

In Summary:

I hope you found this article helpful. Being data driven will obviously help you to make better use of your data. However, becoming data driven involves putting processes into place and having agreement about who owns what at the executive level. It's worth it, but it doesn't happen over night. If you're not yet data driven, I wish you luck on your journey to get there. Your analysts and data scientists will thank you.

If you have suggestions on what else is required to be data driven, please let me know your thoughts!

10/1/18 10/1/18

Setting Your Hypothesis Test Up For Success

Setting up your hypothesis test for success as a data scientist is critical. I want to go deep with you on exactly how I work with stakeholders ahead of launching a test. This step is crucial to make sure that once a test is done running, we'll actually be able to analyze it. This includes:

A well defined hypothesis
A solid test design
Knowing your sample size
Understanding potential conflicts
Population criteria (who are we testing)
Test duration (it's like the cousin of sample size)
Success metrics
Decisions that will be made based on results

This is obviously a lot of information. Before we jump in, here is how I keep it all organized:I recently created a google doc at work so that stakeholders and analytics could align on all the information to fully scope a test upfront. This also gives you (the analyst/data scientist) a bit of an insurance policy. It's possible the business decides to go with a design or a sample size that wasn't your recommendation. If things end up working out less than stellar (not enough data, design that is basically impossible to analyze), you have your original suggestions documented.In my previous article I wrote:

"Sit down with marketing and other stakeholders before the launch of the A/B test to understand the business implications, what they’re hoping to learn, who they’re testing, and how they’re testing. In my experience, everyone is set up for success when you’re viewed as a thought partner in helping to construct the test design, and have agreed upon the scope of the analysis ahead of launch."

Well, this is literally what I'm talking about:This document was born of things that we often see in industry:HypothesisI've seen scenarios that look like "we're going to make this change, and then we'd like you to read out on the results". So, your hypothesis is what? You're going to make this change, and what do you expect to happen? Why are we doing this? A hypothesis clearly states the change that is being made, the impact you expect it to have, and why you think it will have that impact. It's not an open-ended statement. You are testing a measurable response to a change. It's ok to be a stickler, this is your foundation.Test DesignThe test design needs to be solid, so you'll want to have an understanding of exactly what change is being made between test and control. If you're approached by a stakeholder with a design that won't allow you to accurately measure criteria, you'll want to coach them on how they could design the test more effectively to read out on the results. I cover test design a bit in my article here.Sample SizeYou need to understand the sample size ahead of launch, and your expected effect size. If you run with a small sample and need an unreasonable effect size for it to be significant, it's most likely not worth running. Time to rethink your sample and your design. Sarah Nooravi recently wrote a great article on determining sample size for a test. You can find Sarah's article here.

An example might be that you want to test the effect of offering a service credit to select customers. You have a certain budget worth of credits you're allowed to give out. So you're hoping you can have 1,500 in test and 1,500 in control (this is small). The test experience sees the service along with a coupon, and the control experience sees content advertising the service but does not see any mention of the credit. If the average purchase rate is 13.3% you would need a 2.6 point increase (15.9%) in the control to see significance at 0.95 confidence. This is a large effect size that we probably won't achieve (unless the credit is AMAZING). It's good to know these things upfront so that you can make changes (for instance, reduce the amount of the credit to allow for additional sample size, ask for extra budget, etc).

Potential Conflicts:It's possible that 2 different groups in your organization could be running tests at the same time that conflict with each other, resulting in data that is junk for potentially both cases. (I actually used to run a "testing governance" meeting at my previous job to proactively identify these cases, this might be something you want to consider).

An example of a conflict might be that the acquisition team is running an ad in Google advertising 500 business cards for $10. But if at the same time this test was running another team was running a pricing test on the business card product page that doesn't respect the ad that is driving traffic, the acquisition team's test is not getting the experience they thought they were! Customers will see a different price than what is advertised, and this has negative implications all around.
It is so important in a large analytics organization to be collaborating across teams and have an understanding of the tests in flight and how they could impact your test.

Population criteria: Obviously you want to target the correct people. But often I've seen criteria so specific that the results of the test need to be caveated with "These results are not representative of our customer base, this effect is for people who [[lists criteria here]]." If your test targeted super performers, you know that it doesn't apply to everyone in the base, but you want to make sure it is spelled out or doesn't get miscommunicated to a more broad audience.

Test duration: This is often directly related to sample size. (see Sarah's article) You'll want to estimate how long you'll need to run the test to achieve the required sample size. Maybe you're randomly sampling from the base and already have sufficient population to choose from. But often we're testing an experience for new customers, or we're testing a change on the website and we need to wait for traffic to visit the site and view the change. If it's going to take 6 months of running to get the required sample size, you probably want to rethink your population criteria or what you're testing. And better to know that upfront.

Success Metrics: This is an important one to talk through. If you've been running tests previously, I'm sure you've had stakeholders ask you for the kitchen sink in terms of analysis.If your hypothesis is that a message about a new feature on the website will drive people to go see that feature; it is reasonable to check how many people visited that page and whether or not people downloaded/used that feature. This would probably be too benign to cause cancellations, or effect upsell/cross-sell metrics, so make sure you're clear about what the analysis will and will not include. And try not to make a mountain out of a molehill unless you're testing something that is a dramatic change and has large implications for the business.

Decisions! Getting agreement ahead of time on what decisions will be made based on the results of the test is imperative.Have you ever been in a situation where the business tests something, it's not significant, and then they roll it out anyways? Well then that really didn't need to be a test, they could have just rolled it out. There are endless opportunities for tests that will guide the direction of the business, don't get caught up in a test that isn't actually a test.

Conclusion: Of course, each of these areas could have been explained in much more depth. But the main point is that there are a number of items that you want to have a discussion about before a test launches. Especially if you're on the hook for doing the analysis, you want to have the complete picture and context so that you can analyze the test appropriately.I hope this helps you to be more collaborative with your business partners and potentially be more "proactive" rather than "reactive".

No one has any fun when you run a test and then later find out it should have been scoped differently. Adding a little extra work and clarification upfront can save you some heartache later on. Consider creating a document like the one I have pictured above for scoping your future tests, and you'll have a full understanding of the goals and implications of your next test ahead of launch. :)

7/2/18 7/2/18

Designing and Learning With A/B Testing

I've spent the last 6 years of my life heavily involved in A/B testing, and other testing methodologies. Whether it was the performance of an email campaign to drive health outcomes, product changes, Website changes, the example list goes on. A few of these tests have been full factorial MVT tests (my fave). I wanted to share some testing best practices and examples in marketing, so that you can feel confident about how you're designing and thinking about A/B testing.As a Data Scientist, you may be expected to be the subject matter expert on how to test correctly. Or it may be that you've just built a product recommendation engine (or some other model), and you want to see how much better you're performing compared to the previously used model or business logic, so you'll test the new model vs. whatever is currently in production.There is SO MUCH more to the world of testing than is contained here, but what I'm looking to cover here is:

Determining test and control populations
Scoping the test ahead of launch
A test design that will allow us to read the results we’re hoping to measure
Test Analysis
Thoughts on automating test analysis

Choosing Test and Control PopulationsThis is where the magic starts. The only way to determine a causal relationship is by having randomized populations (and a correct test design). So it's imperative that our populations are drawn correctly if we want to learn anything from our A/B test. In general, the population you want to target will be specific to what you're testing. If this is a site test for an Ecommerce company, you hope that visitors are randomized to test and control upon visiting the website. If you're running an email campaign or some other type of test, then you'll pull all of the relevant customers/people from a database or BigData environment who meet the criteria for being involved in your A/B test. If this is a large list you'll probably want to take a random sample of customers over some time period. This is called a simple random sample. A simple random sample is a subset of your population, where every member had an equal probability of being chosen to be in the sample.

Here is a great example on how to pull a random sample from Hive: here

Also, just to be clear, writing a "select top 1000 * from table" in SQL is NOT A RANDOM SAMPLE. There are a couple different ways to get a random sample in SQL, but how to do it will depend on the "flavor" of SQL you're using.

Here is an example pulling a random sample in SQL server: here

Now that you have your sample, you'll randomly assign these people to test and control groups.There are times when we’ll need to be a little more sophisticated….Let’s say that the marketing team wants to learn about ability to drive engagement by industry (and that you have industry data). Some of the industries are probably going to contain fewer members than others. Meaning that if you just split a portion of your population into two groups, you might not have a high enough sample size in certain industries that you care about to determine statistical significance.Rather than putting in all the effort running the A/B test to the find out that you can’t learn about an industry you care about, use stratified sampling (This would involve doing a simple random sample within each group of interest).

Scoping Ahead of LaunchI've seen in practice when the marketing team doesn't see the results they want say "We're going to let this A/B test run for two more weeks to see what happens". Especially for site tests, if you run anything long enough, tiny effect sizes can become statistically significant. You should have an idea of how much traffic you're getting to the particular webpage, and how long the A/B test should run before you launch. Otherwise, what is to stop us from just running the A/B test until we get the result that we want?Sit down with marketing and other stakeholders before the launch of the A/B test to understand the business implications, what they're hoping to learn, who they're testing, and how they're testing. In my experience, everyone is set up for success when you're viewed as a thought partner in helping to construct the test design, and have agreed upon the scope of the analysis ahead of launch.

Test DesignFor each cell in an A/B test, you can only make ONE change. For instance, if we have:

Cell A: $15 price point
Cell B: $25 price point
Cell C: UI change and a $30 price point

You just lost valuable information. Adding a UI change AND a different price option makes it impossible to parse out what effect was due to the UI change or the $30 price point. We’ll only know how that cell performed in aggregate. Iterative A/B testing is when you take the winner from one test and make it the control for a subsequent A/B test. This method is going to result in a loss of information. What if the combination of the loser from test 1 and the winner from test 2 is actually the winner? We’d never know!Sometimes iterating like this makes sense (maybe you don't have enough traffic for more test cells), but we’d want to talk about all potential concessions ahead of time.Another type of test design is MVT (Multivariate). Here we'll look at a full-factorial MVT. There are more types of multivariate tests, but full-factorial is the easiest to analyze.

MVT is better for more subtle optimizations (A/B testing should be used if you think the test will have a huge impact)
Rule of thumb is at least 100,000 unique visitors per month.
You'll need to know how to use ANOVA to analyze (I will provide a follow-up article with code and explanation for how to do this analysis and link it here later)

One illustrative example of an MVT test is below. The left (below) is the control experiences, and on the right are the 3 test treatments. This results in 2^3 = 8 treatments, because we'll look at each possible combination of test and control.

On the left: The controls would be the current experience

On the right: Cell A could be new photography (ex: friendly waving stick figure), Cell B could reference a sale and, Cell C could show new content.

chart assignment in excel for a/b testing

We can learn about all the interactions! Understanding the interactions and finding the optimal treatment when changing multiple items is the big benefit of MVT testing. The chart below shows you how each person would be assigned to one of the 8 treatments in this example.

In a future article I'll write up one of my previous MVT tests that I've analyzed, with R code.A/B Test AnalysisOne of the most important parts of test analysis is to have consistency across the business in how we analyze tests. You don't want to say something had a causal effect, when if another person had analyzed the same test, they might have reached a different conclusion. In addition to having consistent ways of determining conclusions, you'll also want to have a consistent way of communicating these results with the rest of the business. For example, "Do we share results we find with a p-value greater than .05?" Maybe we do, maybe we don't, but make sure the whole team is being consistent in their communication with marketing and other teams. Confidence intervals should always be given! You don’t want to say “Wow! This is worth $700k a year”, when really it’s worth somewhere between $100k and $1.3m. That's a big difference and could have an impact on decisions whether to roll out the change or not.Let's Automate our A/B Test Analysis!Why spend multiple hours analyzing each A/B test, when we can:

Automate removal of outliers
Build in not calculating statistical significance if the sample is not quite large enough yet
Determine statistical significance of metrics with confidence intervals and engaging graphs
See how A/B tests are performing soon after launch to make sure there aren’t any bugs messing with our results or large drops in revenue.
This also reduces opportunity for error in analysis

With a couple data entries and button pushes!This would take a while to build, and will not be a one size fits all for all of your tests. Automating even a portion could greatly reduce the amount of time spent analyzing tests!I hope this article gave you some things to be on the lookout for when testing. If you're still in school to become a Data Scientist, taking a general statistics class that covers which statistics to use and how to calculate confidence intervals is something that will benefit you throughout your career in Data Science. Otherwise, there is certainly tons of information on the internet to give you an overview of how to calculate these statistics. I personally prefer Coursera, because it's nice to sit back and watch videos on the content, knowing that the content is from well known universities.You can learn a ton through properly executed testing. Happy learning!