Multi-model ensembles are commonly used in climate prediction to create a set of independent estimates, and so better gauge the likelihood of particular outcomes and better quantify prediction uncertainty. Yet researchers share literature, datasets and model code-to what extent do different simulations constitute independent estimates? What is the relationship between model performance and independence? We show that error correlation provides a natural empirical basis for defining model dependence and derive a weighting strategy that accounts for dependence in experiments where the multi-model mean would otherwise be used. We introduce the "replicate Earth" ensemble interpretation framework, based on theoretically derived statistical relationships between ensembles of perfect models (replicate Earths) and observations. We transform an ensemble of (imperfect) climate projections into an ensemble whose mean and variance have the same statistical relationship to observations as an ensemble of replicate Earths. The approach can be used with multi-model ensembles that have varying numbers of simulations from different models, accounting for model dependence. We use HadCRUT3 data and the CMIP3 models to show that in out of sample tests, the transformed ensemble has an ensemble mean with significantly lower error and much flatter rank frequency histograms than the original ensemble.