Causal Inference Challenge

[ Français ]



A causal inference challenge will be held in conjunction with the CRM Statistical Causal Inference and Applications to Genetics workshop. There are a wide variety of approaches which have been developed for estimation and causal interpretation when both observational and experimental data are available. Although the performance of each approach will vary from setting to setting, we hope that this competition will provide a realistic setting in which to broadly compare a wide variety of methods. We encourage participants to use either existing or novel methods.

You may compete as an individual or as a team, but each individual can participate in at most one challenge submission. Also, at least one member of each team will be expected to attend the workshop. Every participating team is invited to present their work at a poster session during the CRM workshop where the submissions will be judged. The winning team will recieve a C$300 prize and will also be invited to give an oral presentation during the workshop.

Register for the competition


The International Mouse Phenotyping Consortium (IMPC) is a major collaboration between 18 research institutions worldwide aimed at discovering functional insight for every gene through the systematic phenotyping of 20,000 knockout mouse strains. Over 300 measurements are taken on each animal, in procedures ranging from clinical blood chemistry, through calorimetry and body composition to behavioural phenotypes. Specifically, the data provided for the challenge comprise 22 phenotypic measurements collected by the IMPC.

In the data available to participants, we have randomly selected 5 of the 'knock-out' conditions and removed the observations for a randomly selected variable in each condition. The 'missing values' are indicated as NA's.

Get the data here!

Competition Tasks

Each submission will be judged on two main aspects:

  • - Inferring missing data- For 5 randomly selected 'knock-out' conditions, the observations for a randomly selected variable (different for each condition) have been removed. Participants should use the available data to predict the values of the "missing data". Prediction accuracy will be measured via mean squared error.
  • - Broader causal interpretation: A panel of judges will assess the general analysis framework used by each team. Teams will be judged on appropriateness of the methods applied, scientific insight and statistical novelty.

The final submission should include a poster detailing the methods and analysis. Details for submission will be provided after registration.



  • June 15: Data Released
  • July 18: Registration closes
  • TBD: Results released

Further Questions

Please contact Sam Wang if you have any further questions.