The Ultimate Guide to A/B Testing

The Ultimate Guide to AB Testing.JPG

A/B testing, better known as the thing every entrepreneur should be doing, and taking seriously, but probably aren’t. Why is that? Is it because it feels like a massive waste of time? Or is it simply because there’s only so much time in a day, and getting pressing deadlines, product launches, leads and event planning done seems more…. Productive?

Well, there’s many reasons, these among them. But what if I told you that A/B isn’t a waste? As it turns out, it’s more of a checkup for your business, making it an essential, inescapable part of being an entrepreneur. Sure, it’s tedious, it’s time consuming, but it’s also how you discover how well things are running, what you could be doing better, and what desperately needs to change in order to turn things around quickly if you find yourself in a rocky situation. It’s vital for anyone trying to grow their business.

This is why I’ve created a checklist, of sorts, which is designed to make your A/B testing more fruitful. Because it shouldn’t feel like a hassle, and you should be getting as much out of it as you possibly can each and every time.

Think of it as a way to make the whole thing much more enjoyable. Or at the very least, far more useful to you than it probably has been.

We’ll cover the basics, the more in-depth concepts, and the advanced tactics that cover fringe cases, rather than you common entrepreneur. Let’s get started.


A/B Testing Basics

Think of it this way: A/B testing is one of those things that you really do benefit from planning first. There’s a lot of these things in business, but A/B is on a whole new level. Without that planning step, you kind of have… no direction. You’re just experimenting for the sake of experimenting, to see what happens. Sure, you can gain knowledge this way, but will it be useful information that you can genuinely use?

Maybe. Maybe not.

Still, not the brightest idea to gamble on. That’s how you wind up wasting valuable time and resources. Remember, the purpose of A/B testing is to find weak spots in your company. It’s to find the areas that you could improve on.

Think of these words of advice as we go over the basics of A/B testing. It may help make more sense of it all.


Decide Your Measure of Success Before Anything

ab testing

Success is defined in many different ways. On one hand, you have those who consider wealth, and all that comes with it, success, even if it means sacrificing family time, time to grow as a person, attending important events, you get the idea. On the other hand, you have those who consider sticking to their annual goals success, whatever those goals may be.

But regardless of your definition, everyone shares one thing in common: you set your sights on your idea of success, and you try to get as close to it as you possibly can. We do this in business, we do this in our personal lives. We do it with everything, down to the clothes that we choose to buy.

We want to get more emails, then that’s the goal. That’s our idea of success in terms of an A/B test. And so we try to get as close to meeting that goal as we possibly can. We test out the different tweeting formulas, see which phrasing works better, which types of posts get shared the most, etc.

Do this, and you’ll know why you’re conducting the test. More importantly, you’ll know which variables to change with every test.


Determine the Numbers

The common misconception when it comes to A/B testing is that the bulk of the work comes after it’s live. In reality, there are many steps before it goes live. Once it does, it’s smooth sailing, because you’re just watching things run their course, taking notes, and seeing if your theories were right.

That means before hitting that button, you should have a few factors figured out, including:

  1. The best time to try out the test, considering your high-volume times

  2. The length of time that the test will run

  3. The minimal result you need to see in order to deduce potential

  4. And the sample size you need to determine conclusive results

Luckily, there is online help with this, so don’t freak out. It’s a lot of work, hence what I said in the beginning: most people assume A/B testing is more of a hassle than it’s worth.

But in reality, it’s more about being tactical. If you can be meticulous about your steps taken, you’ll get better, more useful results. And so measuring these four things before going live makes all the difference. Use this spreadsheet if you need help keeping track of it all.

And remember, you have the option to hire a consultant, should you need one. Consultants will assess what you have, what your goals are, and what you need in order to get there. From there, they select which A/B tests to focus on, how to go about them, and what you should do with the results to get the most out of them. If you ever feel like you lack the time, or direction, this option may be useful to you.


Select Your Audience

ab testing

When you first started out your business, you probably (hopefully) developed buyer personas. These were fictional descriptions of your ideal customers. Fictional meaning these descriptions weren’t based on a single real person, they were generalized and meant to put people in a box.

It’s how you kick-started your business, because it allowed you to be very particular about how you chose to market your business, and every product you ever launched. You developed your brand, and decided to cater to a very specific audience that would find your content relatable.

Well, the same does not apply in an A/B test. Although you’ll largely be working within your designated box, so to speak, you’ll also be targeting those on desktop vs. mobile, or customers who read FAQ pages, and those who don’t.

Or in layman terms, you want to learn about a specific section of your audience, which means you need to obtain the least polluted sample that you possibly can. If you need data on your mobile audience exclusively, you shouldn’t A/B test the whole of your audience.


Plan It All Out

In A/B testing you’re bound to get a lot of different results. Some will be spot on, easy to interpret. Others will seem… useless, or inconclusive at best.

So, what do you do in these situations?

Well, that’s where you have to consider all the possible outcomes before jumping right into a live test. You need to plan out what you’ll do if the results are inconclusive, or seemingly useful.

Because it’s not as simple as just going right to the next test. What if your results show potential, and possible benefits for your company, even though the test is not quite at a suitable level?


Remember Hypothesis?

ab testing

Do you remember when you learned the scientific method? As a refresher, here are the steps:

  1. Ask a question

  2. Do some background research on it

  3. Form a hypothesis

  4. Perform an experiment

  5. Then determine if your hypothesis was correct

If your hypothesis is wrong, you simply note the results you did get, and potentially perform the experiment again after altering a variable.

It’s likely that you learned the scientific method in grade school, during science class, and thought “I’ll never use this outside of class.” But in reality, we use this all the time.

Think about baking: you try out a recipe, you conclude that it will result in buttery, flakey biscuits, and then you analyze the results when they come out of the oven. If they’re short of expectations, you alter the recipe. Maybe less flour, or more butter, and so on.

You should be doing the same thing with your A/B test, if you can help it. You want to go into it with an educated guess on the results. Here’s why: it forces you to sit down and ask yourself what you’re doing with this test. What is the purpose of even caring about this particular test?

It helps to focus on the benefits for your company, and it helps you determine the right combination of factors that it takes to get x, y, or z results.


Test One Variant, Unless Corrections Are Being Made

When you’re testing many variables, you often run into a messy problem: you run a higher probability of seeing false positives. This happens because you’re balancing a lot of data at one time, and it’s hard to see the forest from the trees. You wind up mixing the wrong data, not knowing which variables led to what results, etc.

And soon, you see some results that may look promising, but… are they accurate? Who knows!

The good news is that most testing tools have built-in mechanisms that keep track of everything for you, so you don’t have to lose your hair. They’ll even correct for multiple comparisons, sometimes even ten.

But you don’t need to take this approach if it seems to complex, even with the right tools. You can just test one variant, if that’s more up your alley. It pays to keep things simple and start small. You can add more variables as time goes on, once you’re more comfortable.

Remember, it doesn’t matter what your results are, or how much you’re testing, if you have no idea what’s going on, or how accurate any of the data is.


Don’t Call It Too Early

ab testing

Think of all the days in the week. Seven days, 24 hours each. Each one is different, full of its own ups and downs. Well, this carries over into business. Each day brings unique visitors and recurring ones. And each one therefore has its own conversion rate, number of transactions, and revenue.

So even though we may set a goal for every A/B test, it may not be a bright idea to call a test once that goal is met.

Here’s what I mean: say I decide to run an A/B test and my goal is to gain a 4.46% conversion rate on retweets. Suppose I start my test on a Monday, and then reach that conversion on Tuesday or even Wednesday. Should I call the test?

No, because I only reached that goal one day out of the time spent conducting the test. The result pool is much too small. I’d let that test run its course all week, seeing how many times I can reach that goal. By the end of the seven days, I’d see which days were successful, and why. And I’d have the rest of the days of the week to compare results. See why things didn’t work those days, which variables changed.


Avoid Pauses

This one may go without saying, but avoid shutting off variants mid test. You should also never shift traffic allocation during this time.

In fact, just try not to pause and shift things around mid test, period. Otherwise, it’s like you’re conducting a half of two tests during the time it takes to just run one.

And that does nothing good. It only wastes time and resources.

Instead, run your test for the full length, and always start with one variant against the control experiment.

If you’re really rusty on your scientific method knowledge, remember the control experiment is a base experiment with no alterations of any kind. It’s untainted, untouched, unfixed. This allows you to compare the results of the tweaked test more accurately.


Too Good to Be True

ab testing

We’ve all heard of the saying. And you know what? It’s actually scarily accurate. Anytime something is just much too good… there’s something beneath the surface.

What’s a myth, however, is that this is always a bad thing. For instance, you meet somebody new, and things seem to go really well. It’s all good, and you’re getting concerned that it’s much too good to be true.

Well, you’re right, because human nature has taught us over and over again that we a prone to mistakes. There’s something under the surface, the surface being your honeymoon phase. Once you’re done with it, you see your new partner clear as day, flaws and all.

But is that bad? Not at all. Seeing flaws is what keeps things honest. It allows you to know whether this is something you want to keep pursuing, or not.

What is bad, however, is not listening to that base instinct we all have.

For instance, imagine seeing major red flags after a honeymoon phase and… justifying them, rather than listening to that little voice in your head going “That’s not right.”

The same applies to your A/B test. It’s okay if something seems much too good to be true. It means that there’s something under the surface for you to find. It isn’t a bad thing because finding those indescrepencies is what makes the next A/B test more accurate. It’s how you learn.

But you’d be in a world of hurt if you didn’t pick up on the fact that your results are fishy. If you chose to ignore it, push aside that gut instinct, then you’d be moving forward based on justifications, denial, and inaccurate information.


In-Depth Concepts

Consider these next tips slightly more advanced than the basics covered prior. They are aimed towards those who have already implemented and mastered the basics, so try not to skip directly to these unless you’ve done so.

Otherwise, you may find yourself a little lost, and full of inaccurate data that you can’t do anything with.


Controlling External Factors And Variables

ab testing

Regardless of how much you plan, control, and meticulously figure out all the detailings of your A/B tests, there will always be external factors that you simply can’t control.

Some of the common ones are things like weather, or holidays and other temporary spikes, bot traffic and bugs, competitor promotions, and even cross device tracking, to name a few.

And needless to say, it’s frustrating. You’re just trying to run an experiment that you’ve worked hard on, and everything seems to be in your way.

But as annoying as it may be, it’s important to remember that A/B testing is risky business, which means it does you a great service to document anything and everything that may have contributed to the results obtained, including those pesky external factors.

As a side note, it also pays to time things correctly. Maybe you’re tracking your traffic and sales conversion rates, two very obvious things. But then you get a wave of news coverage, you wind up on some news articles, maybe some front pages, and boom, your A/B tests on traffic and sales are no longer accurate. They will momentarily surge, until things die down again, so you might as well stop the testing, and pick it back up later.


Confidence Intervals And P-Values

A significant test is one where the p-value is below 0.05. This ensures there’s a certain level of risk in the decision-making process, but that it never rises past an acceptable point.

In essence, it’s like saying that 5-10% of your experiments may show results that are completely by change, but that’s okay, because at the end of the day, you have the other 90-95% of the big picture.

But see, most people don’t like to use these confidence intervals, because they don’t like the risk involved. The idea that you should alot an acceptable amount of error in your test just doesn’t make sense to most, but it’s the truth. There will be mistakes.

And the idea here is that if these intervals overlap, then you probably don’t have a true winner.

Obviously, the larger your sample size is, the lower the margin of error in an A/B test. There’s more to work with, you get a bigger perspective, the works. But not everyone has the traffic to accommodate that. Consider it a commodity. If you want that to be your reality, seek to improve your digital marketing strategies, video content, landing pages, and of course, your lead generation. Don’t forget to check your copy as well, as that’s something super important that often gets overlooked.


Do No Harm vs. Go For It

ab testing

Over time, you’ll come to find that your focus will shift from each individual test, toward the whole picture. When this happens, remember to categorize your experiments into two main categories:

  1. Do No Harm - where you care about the potential risk involved, and you need to avoid it.

  2. Go For It - where you know there’s no risk when making a given decision.

And it works best as an illustration.

Suppose there’s a publishing house and they’re working with several successful novels at a time. They find themselves having to choose which of this year’s novels to feature at an upcoming promotional event, and they’re trying to figure out which one would be better received.

But every novel they’re considering has done well, so ultimately it doesn’t matter which one they choose. They’re all novels, successful ones. This is a Do No Harm question.

If there isn’t a best option, then there’s no risk of just selecting one at random. It will cost the same amount regardless of which they choose.


Control for Flicker Effect

This is what it’s called when your testing tool causes a minimall delay on the experiment variation. The result of the delay, however small, is a brief flashing of the original content before serving that variation.

Obviously, it’s harmful to any experiment. It can taint results, essentially wasting your time and energy. The good news is that there are ways to reduce flicker effect.

Here are some solutions:

  1. Speeding up your website to avoid long waits and flickering alike.

  2. If you’re using your testing tool to fix your website, roll those changes into product code instead of running them in another experiment.

  3. Measure things twice, cut once.

  4. QA test all of your major devices and categories before going live.

  5. Use CSS as much as possible.


Know When to Trigger

ab testing

Let’s say that you’re running an experiment on people who have subscribed to your newsletter. Anyone else, any other audience member who hasn’t, wouldn’t be included in the test.

Filtering out unaffected users helps improve the statistical power of your test. It cuts back on irrelevant information, and makes it easy to find valuable data within the selected audience.

So, how do you measure it? You have to know when to trigger an event at the precise moment where you’re looking to start analysis, in this case, when they hit that subscribe button.


Consider Using A/A Tests

So many people consider this an utter waste of time, but before you cast judgment, let’s consider the reasoning behind it.

First of all, there’s no one size fits all. If you want to use an A/A/ test, it should be because of your scale and what you’re hoping to learn. Not everyone will benefit.

And as for the purpose, it’s to test the original vs the original. It essentially helps to establish trust in your testing platform.

A/A tests will pull out all the errors and software bugs into the light, so you know exactly what you’re dealing with. This is critical if you’re running really advanced experiments.

Hence my earlier point, there’s no one size fits all here. Unless you’re working at a really high level, you probably don’t need to worry about A/A.


Advanced A/B Tactics

Once you reach a point where you’re comfortable with A/B testing in general, have the basics down pat, and feel confident with intermediate tactics, it’s time to turn the heat up.

It’s time for advanced methods.


Consider Non-Inferiority Tests

ab testing

If you want to test in order to mitigate risk and avoid implementing a mediocre experience, you should try out non-inferiority testing. They help with easy decision tests, or those with side effects outside of measurement capability.

You may be wondering, however, why we would ever want to implement something that’s not proven to be better than your current state of affairs.

Well, let’s go back to an earlier point: the greater your sample size, the greater the power of the test. And the smaller the margin you want to be able to detect, the lower the power. That means even if you have a lot of traffic, your A/B tests won’t detect small improvements as something statistically significant.


Use Predictive Targeting

If you’re working on advanced things, then you’ll likely get to a point where it’s just impossible to manage targeting rules for every segment you’re trying to reach.

This is why so many business that operate high-level testing opt for powerfully predictive targeting engines, which find segments who respond better to given experience than the average user.


Look Out for Ratio Mismatches

ab testing

It may seem simple, but a ratio mismatch is something to really look out for. You should be hoping for your traffic to be randomly and evenly allocated among two variants, assuming your A/B test has two.

But when that ratio is off by a lot, then it’s a mismatch. If you’re experiencing a bug like this, try using this handy calculator.


Use A Futility Boundary

Consider this a testing methodology used to improve the efficiency of your test. It also allows you to stop the A/B test earlier.

Why would you want to do that though?

Because stopping it early when the data suggests a very low chance that any of the variants are going to prove noteworthy is useful. It gives you the chance to call the experiment failed and move right along with the next one, should you want to salvage some time back.

Every test is an investment. When you’re using QA, you’re analyzing and reporting on these tests, having meetings… and then the results aren’t anything substantial, that’s a colossal waste of time. Being able to stop it all allows you to quit using resources on things that ultimately don’t matter, so you can focus on what does.


In Conclusion

ab testing

A/B testing is a lot of time consuming hard work. There’s a lot to be done, and frankly, it can be overwhelming. Keeping tabs on your variables, your business cycles, days of the week, external factors, having to document everything while keeping an eye out for things that are too good to be true. Even considering things like an acceptable margin of error is a thing.

This is why so many entrepreneurs choose to skip A/B testing altogether, opting to essentially fumble in the dark. They purposefully choose to make uninformed business decisions and hope for the best, because it’s just easier than testing things out.

But the reality of the situation is that if you’re running a business, it’s your duty to run it the best that you can. Because you’re the one in charge. Not to get too dark, but without taking the proper steps, you’re risking the future of your company, and therefore, your staff’s livelihood as well.

Testing may be tedious, but when done correctly, it can prove beneficial for all your business endeavors. It can teach you more about your audience, let you know what they find appealing and what they could do without, and therefore it even helps to stay on track with your branding.

A/B tests can also assess the business tactics that work the best for you. Maybe certain social media posts you’ve been making have been falling flat, and costing you followers. Or perhaps your Google or Facebook ads have failed to align with the landing pages connected to them, so people feel misled.

Do yourself a favor and at least consider the possibility of A/B testing more. Stick to one or two variables at first. And if you need help, remember that there’s always tools, resources, and even consultants who can help.

So, which of these tips did you find the most helpful to you, and why?

Let me know in the comments section below, I love hearing from you!