How to design self-experiments

A spirit of experimentation is afoot. Science has always given us great stories about people who experimented upon themselves. Whether we’re entertained by Santorio Santorio weighing his own poo, or by Albert Hofmann testing out LSD on himself, we could too easily come away with the idea that self-experimentation is dangerous, extreme, and just a touch silly.

In the last few years, numerous popular self improvement writers like James Clear, Tim Ferriss, and Mark Sisson have been near the front of a movement. Thousands upon thousands of ordinary people geeks are running their own lives like science experiments in a common mission to make their lives better. Nowadays this activity is pretty much always accompanied by the use of shiny new gadgets and apps, and as a result, quite a market is growing for those who make these shiny things.

Why bother with self-experimentation?

To a large extent, the answer is the same as to the question, “Why bother with science?” Answer: Because human insight is hugely fallible. It’s only by placing carefully worked-out strictures on our knowledge-gathering apparatus that we can stop ourselves from getting completely the wrong end of the stick about all sorts of things.

For one, human memory is fairly rubbish. What did you eat for lunch three weeks ago last Saturday? Dinner, the Wednesday before that? I thought not. We forget things, and often we forget to notice things so that we don’t even put them into our memories in the first place. Worse than this, by far, is that we misremember things. Perhaps the most persuasive work ever done on this was by Elizabeth Loftus, who has showed again and again that features of the question we’re asked at the point of recall affect what we remember. If someone asks you “how fast was the car going when it smashed into the lampost” you’re likely to give a higher estimate than if you’re asked “how fast was the car going when it bumped into the lampost”.

Here’s a great TED talk Loftus gave, for your delectation.



Nobel winner, Daniel Kahneman has done amazing work on how we tend to remember pertinent points in our experiences, rather than the average. His book, [Thinking Fast and Slow](Thinking, Fast and Slow) reflects the work of a genius. His description of how longer colonoscopies are preferable because we remember the less painful bit at the end of the procedure, is very persuasive…



As a scientist, I’m simultaneously delighted that huge numbers of people are getting excited about applying the scientific method to their own lives, and horrified at the level of scientific illiteracy revealved when I trundle around some of the websites dedicated to these persuits. I could sit here in my office at the university and bemoan the state of science education, or I could provide a simple how-to guide. I figured I’d do the latter.

Measuring stuff

The basic requirements for self experimentation would seem simply to be measuring stuff — instead of relying on our crappy memories, and having something prompt us to take the measurement — instead of relying on our crappy memories.

There is now quite a market in gadgets for measuring aspects of your our own behaviour. The Quantified Self movement has a lovely guide to toys and web services that will help. That’s a great place to start, but often you don’t need much more than your smarthphone. If you’re a technophobe, you could manage with a handful of index cards or sticky notes, a pen or pencil, and a watch that beeps.

Now if you’re wading in to self-experimentation you might be inclined to start measuring all sorts of things, and you might fall prey to the modern method of adopting a new activity — spending a boat-load of money on the toys marketers tell you that you’ll need for the expedition.


In the words of Henry David Thoreau in Walden,

beware of all enterprises that require new clothes, and not rather a new wearer of clothes.

Here are my golden rules for measurement in self-experimentation (they’re pretty much the same ones I stick to when conducting proper experiments in the lab):

  • Keep it simple. The more complex a system, the less likely you are to stick with it long enough to get useable results. If you have to open an app, link it via bluetooth to some gadget, sync, swipe left, right, and round and round … you’re soon going to decide to skip this one measurement because you’re in a rush. Then you’ll skip a few more. And soon your data will look like swiss cheese.
  • Choose a measure that comes out the same each time. We call this reliability. If your pedomiter sometimes registers 5000 steps because you were tapping your foot to a particularly gnarly album, then it’s officially a Pointless Pedometer. If it gets things nearly right, plus or minus a hundred steps, it’s probably OK (see below).
  • Sometimes we can have a reliable measure, that’s invalid. I once had a wristwatch that claimed to measure body temperature, but the reading was influenced quite a lot by the temperature of the surrounding air. I don’t mean that if I went into an air-conditioned room it’d go down gradually as my body cooled, I mean, the second I walked in the door it’d drop five degrees. That’s not really a measure of my body temperature anymore. It’s some weird composite of “me + outside”. Not helpful.
  • Aim for good enough, not perfect. If your pedometer costs less than a trip to Starbucks, perhaps you need to spend more money, but not much more. GPS systems, heart rate monitors … you name it, you can get decent measurement kit for very little money. Excellent kit costs a fortune. Stick with good enough. You only need high precision measurements if you want to detect tiny changes, and surely you want good noticable changes in your life, so extremely high precision is reduntant.

Designing an experiment

Here’s where those without a science background tend to go off the rails. If you want to find out whether getting up at 5am boosts your productivity, what do you do? You measure your productivity over the week (by a means of your choosing) and generate a Week 1 Productivity Total, then on Sunday you start getting up at 5am. After a week of screaming irrationally at your alarm clock, you calculate a Week 2 Productivity Total and you compare the numbers, right?


Humans are messy. And so are the environments we live in. Behaviours vary over time. They drift up and down. Some days you’ll be really unproductive, even though you got up at 5am, because your boss calls you in to a three-hour emergency meeting, and when you get home the cat’s been sick all over the kitchen floor. All this stuff is noise. The ups and down in productivity (or whatever you’re measuring) caused by sick cats, demanding bosses, bad weather, or whatever else, have nothing to do with the life change we’re testing out — getting up early. We have to find a way to detect the signal amongst the noise.

In large scientific studies, we use all sorts of mathematical procedures, ranging from “showing off with a pencil and paper” to “why is my laptop overheating doing these sums?” Luckily, there are simpler ways, especially when the signal you’re hoping to detect is fairly strong — when the life change has had a fairly clear impact on the outcome.

The simplest method is called taking multiple baselines. The idea is that instead of generating one big number to reflect Week 1 and another for Week 2, you plot daily figures on a graph, and see whether there is an obvious departure from the prior trend at the point where you made the change. As they say, a picture is worth a thousand words, so I drew you one…

Multiple baseline

Multiple baseline designs are sometimes used in psychological studies, however, they’re not perfect. There’s another problem too. We tend to become good boys and girls when we think something is being measured, perhaps even if we’re measuring it ourselves. This type of phenomenon is called a Hawthorne effect (named after the Hawthorne factory where they were first noticed). These effects can occur because we’re happy to see that someone is taking an interest in us, so we put on our best show, but they can also happen just because of the novelty of the situation. “Oooh! I’ve got a shiny new pedometer, I wonder if it works well when I’m climbing stairs?” climb

To combat the Hawthorn effect, and most other problems too, we can design what’s called an ABAB trial. Here’s another picture I lovingly crafted for you…

ABAB trial

ABAB trials simply involve going back and forth between the two things you want to compare. You choose a period of time, say a week, when you’ll do thing A (wake up at 7am), followed by another week when you’ll do thing B (wake up at 5am). Ideally, we plot more than one time point in each phase, either A or B, so in this case we’d plot our mood/productivity/amount-of-money-spent-on-coffee on a daily basis, and we’d have seven measures for each phase. If the change has the effect we’re hoping for, we’ll see the graph line wiggle up and down a bit due to noise during each phase, but between phases, we’ll see a pronounced shift as the change has its effect.

With both mulitple baseline and ABAB trials, there’s some fancy math you can do, but in my experience, if the change is really there, and is big enough to worry about, you can simply see it on the graph. Lots of apps and gadgets will do all this graphing for you. You just need to design the self-experiment and keep a record of when you switch from A to B to A to B. Here’s a screenshot of the stock Health app on my iPhone. You can pretty clearly see the day I came back from vacation and started sitting behind a desk more.

Lee's health data


Go on, give it a try!

I haven’t been doing all that much self-experimentation for about a year now, but a post by James Clear this morning made me realise how many great changes I’ve brough about in my life as a result of self-experimentation. I think it’s time I get back to it. If you struggle with motivation to make changes in your life, and you can stomach drawing a few graphs, it’s probably the ideal thing for you. Seeing the effects of a simple daily choice graphed out in front of you is pretty powerful.

One last thing. There’s a teeny tiny lawyer on my shoulder weeping softly into my t-shirt so…. I’m not suggesting you try out illicit drugs, or do anything dangerous. Use your judgement. Waking up earlier, trying out keeping a journal, exercising before lunch, or some other such habit is, however, unlikely to do you irreparable harm.