Hyuna Yang and colleagues seem to have at least part of the answer. They had five different research centers analyse the exact same RNA samples, and collected the raw fluorescence values—before normalisation or any other kind of analysis. After a long (and, dare I say, tedious) analysis, they actually found that batch processing effects had a significant effect on the list of affected genes detected. The authors do a good job of explaining what batch effects are, so I'll open the floor to them:
Due to personnel and equipment constraints, all samples may not be processed at one time. For instance, one fluidics station used for the wash and staining step, is able to process only up to four samples [at a time].
Because of this, some samples are of necessity processed at different times. If the experimenters are not careful when deciding how to group the samples for processing, this can result in confounding factors:
Both centers 2 and 3 stored male arrays at 4 degrees while female samples were washed and stained. These centers have the longest lists of differentially expressed genes between sexes.Similarly, centers 4 and 5 grouped samples according to mouse strain, and it showed in their long lists of genes differentially expressed between strains. What's happening is that the list of genes actually affected by a particular biological condition (sex or strain) is being contaminated by genes affected by sample processing order, a factor that, I think you'll agree, is extremely uninteresting from a biological perspective. This is very bad news for whoever wants to analyse the data after the fact (including your humble correspondent, actually!). The authors conclude that the batch effect "cannot be removed from the data without compromising the biological signal."
The discussion section of the paper is required reading for anyone who will be designing and running microarray experiments in the future, or any kind of experiment, for that matter. The gist of it is this: processing in batches is inevitable, but confounding batches and biological factors is not! When deciding in what order to process your samples, assign them randomly to batches, not systematically (as we are all wont to do). (Sample stratification would also work, though the authors don't mention it.)
So, finally, what should we think about published microarray results? I'd have to agree with critics that single experiments found in the literature to date are not trustworthy. Most published microarray studies, however, follow up with targeted experiments. And one hopes that future microarray experiments (or whichever expression technology replaces them) will take heed of the recommendations of Yang et al's PLoS ONE paper. That would certainly go a long way to improving the reproducibility and trustworthiness of genome-wide expression studies.
[ This post was part of the PLoS ONE @ Two synchroblogging celebration! ]
Hyuna Yang, Christina A. Harrington, Kristina Vartanian, Christopher D. Coldren, Rob Hall, Gary A. Churchill (2008). Randomization in Laboratory Procedure Is Key to Obtaining Reproducible Microarray Results PLoS ONE, 3 (11) DOI: 10.1371/journal.pone.0003724
E BLALOCK, K CHEN, A STROMBERG, C NORRIS, I KADISH, S KRANER, N PORTER, P LANDFIELD (2005). Harnessing the power of gene microarrays for the study of brain aging and Alzheimer's disease: Statistical reliability and functional correlation Ageing Research Reviews DOI: 10.1016/j.arr.2005.06.006