Centre de recherches mathématiques

Colloque de statistique de Montréal

CRM/ISM/GERAD

Calendrier des conférences CRM/ISM/GERAD - Année 2008-2009


Le vendredi 1er mai 2009 / Friday, May 1st, 2009
15h30 / 3:30 p.m.

Constantine Frangakis (Johns Hopkins University)

The role of principal stratification in instrumental variables in case-control designs - an application to Mendelian randomization

McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36

RESUME/ABSTRACT:

We are motivated by studying the effect that the expression of inflammatory genes has on the risk of colorectal cancer. Gene expression, however, is likely confounded with other risk factors for cancer. But because meiosis within families is considered a random process, the genotypes can potentially be used as instruments for the actual inflammation levels. The problem we address here is that designs are typically based on case-control sampling for these settings. We show first that, in contrast to settings with no confounding, modeled with conditional logistic regression, instrumental variables causal effects are generally incorrectly estimated if the design effect is ignored, as they are not invariant under such designs. We show, second, how in general the framework of principal stratification is useful to validly estimate the causal effects under such designs. We demonstrate these results with the effect of inflammation on colorectal cancer.


Le vendredi 24 avril 2009 / Friday, April 24, 2009
15h30 / 3:30 p.m.

Jinko Graham (Simon Fraser University)

Graphical Displays to Uncover Gene-environment Interaction from Data on Case-parent Trios

UQAM, Pav. Président-Kennedy, 201, av. Président-Kennedy, salle PK-5115

RESUME/ABSTRACT:

In genetic association studies of complex diseases, case-parent-trio designs involve the collection of data from affected offspring and their parents. This design is well-suited to diseases of early onset, such as type 1 diabetes and childhood leukemia. Unlike the case-control design, the case-parent design is robust to bias from ethnic differences between cases and controls and it enables investigation of parent-of-origin effects for genetic risk factors. While the use of the case-parent design for finding genetic associations has been well studied, its use for uncovering gene-environment interactions is less well-understood. We review two existing ad-hoc approaches to explore gene-environment interaction from case-parent trios and illustrate their potential bias. We propose an alternate penalized likelihood approach that does not suffer from such bias and illustrate its use on simulated data. We conclude with some directions for future research. This is joint work with Ji-Hyung Shin and Brad McNeney.


Le vendredi 17 avril 2009 / Friday, April 17, 2009
15h30 / 3:30 p.m.

David Dunson (Duke)

Bayesian Density Regression with Epidemiology Applications

McGill, Burnside Hall, 805 Sherbrooke O., Salle 1B24

RESUME/ABSTRACT:

In assessing relationships between a response and multiple predictors, it is appealing to allow the conditional response distribution to vary flexibly, allowing non-linear and varying relationships with the different quantiles and predictors. Such flexibility is of critical importance in applications in which the tails of the distribution are of primary interest. For example, in epidemiology studies of continuous health responses, the tails of the distribution typically correspond to those individuals having the most adverse health conditions. We would like a method that can allow an environmental exposure, genetic factor or demographic covariate to flexibly impact risk of an adverse response, with adverse corresponding to values in the tails of the distribution. Values further in the tails vary in their severity, so it is important to avoid categorization or grouping. Motivated by studies of pregnancy outcomes and premature delivery, this talk proposes Bayesian nonparametric methods for density regression. I will also describe applications to molecular epidemiology studies. The talk is designed to be accessible to a general audience of biostatisticians and epidemiologists, so technical details will kept to a minimum.


Le vendredi 3 avril 2009 / Friday, April 3, 2009
15h30 / 3:30 p.m.

Mary Lesperance (Victoria)

Testing for Benford's Law and Possible Fraud Detection

CRM, UdeM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 6214

RESUME/ABSTRACT:

Recent high profile accounting scandals have revealed the need for automated methods which can quickly analyze large amounts of financial data and signal when unusual observations are present. The literature suggests that many financial (and other) data sets conform to the first digit frequency distribution known as Benford's Law. In this paper, various methods of testing whether observed frequencies of first significant digits agree with Benford's law are presented and compared in terms of their power. Theoretical and empirical results are used to compare these methods. Some recommendations are given on how these procedures may be employed in the field of accounting to detect unusual observations, fraud or error.


Le vendredi 27 mars 2009 / Friday, March 27, 2009
15h30 / 3:30 p.m.

Lei Sun (University of Toronto)

Unifying Stratified and Weighted FDR Methods with Applications to Large-Scale Genetic Studies

Concordia University, salle LB 921-04

RESUME/ABSTRACT:

A central issue in high-dimensional genetic studies is how to assess statistical significance taking into account the inherent large-scale multiple hypothesis testing. To improve power, a number of studies have investigated the benefits of utilizing available prior information, however, the relative merits of different methods remain unknown. We focus on the stratified FDR (Sun et al., 2006) and weighted FDR (Genovese et al., 2006; Roeder et al., 2006) control methods. The two approaches model the prior distinctively. Weighted FDR converts the available prior information to test-specific weighting factor and adjusts the p-values accordingly. In contrast, stratified FDR divides tests into several disjoint strata based on the prior information and applies FDR control separately in each stratum. We first unify the two approaches in one framework and we show the trade-off between power and robustness by theoretical, simulation, and application studies. Robustness is desirable to safeguard against potential uninformative or even misleading prior information. We demonstrate the practical relevance by applying the two methods to three genome-wide association studies on diabetes and diabetes-related complications using previous genome-wide linkage results as the available prior information. This is joint work with Yun Joo Yoo, Shelley Bull, Andrew Paterson and Daryl Waggott.


Le vendredi 20 mars 2009 / Friday, March 20, 2009
15h30 / 3:30 p.m.

Susan Shortreed (McGill University)

Learning in Spectral Clustering

McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36

RESUME/ABSTRACT:

Spectral clustering is a pairwise clustering technique that uses the eigenvectors and eigenvalues of a normalized similarity matrix to cluster the data. While it is a popular clustering method, a limiting factor in spectral segmentation is that the similarity matrix is not usually known a priori. In this talk we will review spectral clustering and present our method for learning the similarity matrix. We introduce the idea of optimizing a cost function composed of clustering quality term, the gap, regularized by a clustering stability term, the eigengap. We will present our supervised learning methods in detail, which assumes that a training set with known clustering labels is available for learning the similarity matrix. We will also discuss how we can extend our methodology to the unsupervised and semi-supervised frameworks.


Le vendredi 13 mars 2009 / Friday, March 13, 2009
15h30 / 3:30 p.m.

Fernando Camacho (Damos Inc., Toronto)

Statistical Analysis for Life Cycle Management of Steam Generators

Concordia University, LB-921.04, Library Bldg., 1400 de Maisonneuve West

RESUME/ABSTRACT:

As equipment and systems age, Life Cycle Management (LCM) analysis becomes an important tool in assessing and managing potentially life limiting degradation mechanisms. Adequate LCM analysis usually considers a range of inspection and mitigation strategies aiming to maintain or extend the technical and economic life of the equipment. The assessment of these strategies needs to reflect not only the deterioration rate of the equipment, but also the impact the mitigation strategies have on the equipment. Deterioration rates can be assessed based on historical inspection trends, but in general it is much harder to assess how different mitigation strategies may affect the deterioration. This talk will describe some of the statistical analyses carried out to develop models that could be used on the LCM of steam generators of nuclear reactors. In particular, we will discuss the data collection, parameter estimation and variable selection used to select a model suitable to assess the effect of different mitigation actions on the deterioration rate of tube pitting in the steam generators. (Joint work with Sandra Pagan, Ontario Power Generation Inc., Pickering, CANADA)


Le vendredi 6 mars 2009 / Friday,March 6, 2009
15h30 / 3:30 p.m.

Román Viveros-Aguilera (McMaster University)

Quality Control in Health Care

McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36

RESUME/ABSTRACT:

While many types of health care processes and services are similar to those in other organizations, others such as medical interventions show notable differences. For instance, in industrial applications where quality control methods have a long history, the units sampled are products with a high degree of homogeneity as they are manufactured under largely controlled conditions. By contrast, patients subject to medical procedures exhibit extensive variety in their health profiles. This calls for new monitoring methods or major adjustments to existing industrial quality control methods to make them effective in the new situations. Risk-adjustment is a term recently introduced to describe some of the adaptations. In this talk we examine the issues, discuss some of the challenges as well as some of the solutions. The technical elements will be kept at a low volume.


*****ATTENTION: Cette conférence est annulée / This conference has been cancelled ****

Le vendredi 27 février 2009 / Friday, February 27, 2009
15h30 / 3:30 p.m.

Sayan Mukherjee (Duke University)/Dept of Statistical Science; Dept of Computer Science; Institute for Genome Sciences & Policy; Dept of Biostatistics and Bioinformatics
Joint work with Justin Guinney, Simon Lunagomez, Mauro Maggioni, Robert Wolpert, and Phillip Febbo

Two Representations of Graphical Models

CRM , Amphi 5340 Pause café / Coffee break salle/room 4361

RESUME/ABSTRACT:

In this talk I will discuss two problems: decomposition of gene networks and inference of conditional dependencies. The first part of the talk describes a method to decompose pathways or gene networks into sub-networks and infer the relevance of these sub-components in explaining phenotypic variation. The approach which we call multiscale graphical models is strongly related to old ideas such as path analysis. Specifically, it is based on the idea of diffusion wavelets which in our application is a multiscale decomposition of a partial correlation matrix or the generator of a Markov chain. We describe results on yeast gene expression data to illustrate the method and then provide preliminary data on prostate cancer. The second part of the talk formulates a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, an informative prior distribution is placed over decomposable graphs and the induced posterior distribution is sampled. The key idea developed is a parametrization of decomposable hypergraphs using the geometry of points in Euclidean space. This allows for specification of informative priors on decomposable graphs by priors on a finite set of points. This construction has been well studied in the fields of computational topology and random geometric graphs.


Le vendredi 20 février 2009 / Friday, February 20, 2009
15h30 / 3:30 p.m.

Marina Meila (University of Washington)

Consensus Ranking under the Exponential Model

CRM, salle 6214

RESUME/ABSTRACT:

This talk is concerned with summarizing -- by means of statistical models -- of data that expresses preferences. This data is typically a set of rankings of n items by a panel of experts; the simplest summary is the "consensus ranking", or the "centroid" of the set of rankings. Such problems appear in many tasks, ranging from combining voter preferences to boosting of search engines. We study the problem in its more general form of estimating a parametric model over permutations, known as the Generalized Mallows (GM) model. The talk will present a new exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate prior for this model class. Then we introduce the infinite GM model, corresponding to "rankings" over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering. Joint work with: Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes


Le vendredi 13 février 2009 / Friday, February 13, 2009
15h30 / 3:30 p.m.

Thomas A. Louis (Johns Hopkins Bloomberg School of Public Health)

Trend Tests that Accommodate Genotyping Errors

McGill University Burnside Hall, 805 Sherbrooke O., Salle/Room 1B36

RESUME/ABSTRACT:

High-throuput SNP arrays provide estimates of genotypes for up to one million loci. These estimates are used, for example, in genome-wide association studies that relate genotype and phenotype (e.g., disease) for a sample of individuals. Common practice is to rank SNPs using test statistics, p- values or Bayesian structuring. While genotype calls are typically very accurate, genotyping errors do occur and these can greatly influence statistical analysis of genotype/phenotype associations. However, estimates of genotype uncertainty are available for some platforms. Currently, they are used to identify, for each individual, SNPs with a sufficiently uncertain call. These are set aside in evaluating associations. This approach unnecessarily reduces information and can be biased. As an improvement, we derive and study a trend test test statistic for genotype/phenotype association that takes genotype uncertainty into account, thus avoiding the need to set-aside uncertain SNPs and thereby making best use of available information. Using simulations informed by the HapMap dataset, we show the effectiveness of this approach compared to setting aside uncertain genotype calls and to making deterministic calls. Effective- ness depends on an accurate assessment of uncertainty; with accurate assessment the approach can substantially improve identification of causal SNPs. In addition, we present a mathematical representation that reduces the need for simulation to assess performance in identifying a single, causal SNP in the context of a large number of comparator SNPs.


Le vendredi 6 février 2009 / Friday, February 6, 2009
15h30 / 3:30 p.m.

Taoufik Bouezmarni (Université de Montréal)

A Nonparametric Test for Conditional Independence using Bernstein Density Copulas

UQAM, 201, ave Président Kennedy, Salle PK-5115

RESUME/ABSTRACT:

This paper proposes a new nonparametric test for conditional independence which is based on the comparison of Bernstein copula densities using the Hellinger distance. The test is easy to implement because it does not involve a weighting function in the test statistic, and it can be applied in general settings since there is no restriction on the dimension of the data. We proof that the test statistic is asymptotically pivotal under the null hypothesis, establish local power properties, and motivate the validity of the bootstrap technique that we use in finite sample settings. A simulation study illustrates the good size and power properties of the test. We illustrate the empirical relevance of our test by focusing on Granger non-causality using financial data to test for nonlinear leverage versus volatility feedback effects.


Le vendredi 30 janvier 2009 / Friday, January 30, 2009
15h30 / 3:30 p.m.

Christian Robert (CEREMADE - Université Paris-Dauphine)

Computational Approaches to Bayesian Model Choice

CRM, UdeM, Pav. André-Aisenstadt, 2920 ch. de la Tour, salle 6214

RESUME/ABSTRACT:

n this talk, we will cover recent developments of ours and of others in the computation of marginal distributions for the comparison of statistical models in a Bayesian framework. While the introduction of reversible jump MCMC by Green in 1995 is rightly perceived as the 'second MCMC revolution,' its implementation is often too complex for the problems at stake. When the number of models under study is of a reasonable magnitude, there exist computational alternatives that avoid model exploration with a reasonable efficiency and we will discuss here the pros and cons of several of those methods. Joint work with Jean-Michel MARIN, Université Montpelliers 2, Orsay, and Nicolas CHOPIN, CREST-INSEE.


Le vendredi 23 janvier 2009 / Friday, January 23, 2009
15h30 / 3:30 p.m.

Andreas Kyprianou (The University of Bath)

Refracted Levy Processes

Concordia University, LB-921.04, Library Bldg., 1400 de Maisonneuve West

RESUME/ABSTRACT:

We discuss solutions to a very elementary, but none the less degenerate, SDE which describes the aggregate path of a Levy process when is perturbed by a linear drift every time it spends time above a fixed level. Despite the simple nature of the SDE, some work is required to establish existence and uniqueness of a solution. This problem is put in context by an application in insurance mathematics.


Le vendredi 5 décembre 2008 / Friday, December 5, 2008
15h30 / 3:30 p.m.

Peter Hoff, University of Washington

Hierarchical Eigenmodels for Pooled Covariance Estimation

UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340

RESUME/ABSTRACT:

While a set of covariance matrices corresponding to different populations are unlikely to be exactly equal, they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, while other pairs may be consistently negative. In such cases the similarities across covariance matrices can be described by similarities in their principal axes, the axes defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. Similar eigenvector matrices can be pooled to form a central set of principal axes, and covariance estimation for populations having small sample sizes can be stabilized by shrinking estimates of their population-specific principal axes towards the across-population center. To this end, in this talk we'll discuss a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix valued antipodally symmetric Bingham distribution that can flexibly describe notions of center and spread for a population of orthonormal matrices.



Le vendredi 21 novembre 2008 / Friday, November 21, 2008
15h30 / 3:30 p.m.

Duncan Murdoch, University of Western Ontario

Two Recursive Simulation Schemes

McGill University, Burnside Hall, 805 Sherbrooke O., 1B39

RESUME/ABSTRACT:

In this talk I will present two recursive simulation schemes. In the first, our aim is to do exact simulations of functionals of diffusion solutions of stochastic differential equations: the times of events such as extremes and barrier crossings, or multivariate outcomes such as the joint times and values of the minimum and maximum. The second scheme is an adaptive rejection sampler targeted at relatively high dimensional densities. Using recursive partitioning and proposals which are locally independent in each component we construct samplers with high acceptance rates. This is joint work with Tingting Gou and John Braun.


Le vendredi 7 novembre 2008 / Friday, November 7, 2008 15h30 / 3:30 p.m.

Peter McCullagh, University of Chicago

Sampling bias in logistic models

McGill University, Burnside Hall, 805 Sherbrooke O., 1B39

RESUME/ABSTRACT:

This talk is concerned with regression models for the effect of covariates on correlated binary and correlated polytomous responses. In a generalized linear mixed model, correlations are induced by a random effect, additive on the logistic scale, so that the joint distribution $p_{\bfx}(\bfy)$ obtained by integration depends on the covariate values $\bfx$ on the sampled units. The thrust of this talk is that the conventional formulation is inappropriate for most natural sampling schemes in which the sampled units arise from a random process. The conventional analysis incorrectly predicts parameter attenuation due to the random effect, thereby giving a misleading impression of the magnitude of treatment effects. The error in the conventional analysis is a subtle consequence of selection bias that arises from random sampling of units. This talk will describe a non-standard but mathematically natural formulation in which the units are auto-generated by an explicit process and sampled following a well-determined plan. For a quota sample in which the covariate configuration $\bfx$ is pre-specified, the model distribution coincides with $p_{\bfx}(\bfy)$ in the GLMM. However, if the sample units are selected at random, either by sequential recruitment or by simple random sampling from the available population, the conditional distribution $p(\bfy \given \bfx)$ is different from $p_\bfx(\bfy)$. By contrast with conventional models, conditioning on~$\bfx$ is not equivalent to stratification by~$\bfx$. The implications for likelihood computations and estimating equations will be discussed.


Le vendredi 31 octobre 2008 / Friday, October 31, 2008 15h30 / 3:30 p.m.

Surajit Ray, Boston University

Clustering and classification of functional data

UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340

RESUME/ABSTRACT:

Functional approaches to modeling dynamics of biological systems, trends in financial cycle, seasonal measurements of spectral bands in remote sensing, are becoming increasingly popular as a data analysis tool. On the other hand a recent approach aims at reducing the dimension of large $p$ small $n$ problems using a functional embedding of the p-dimensional vector (Talk by Hans-George Muller at JSM 2008). Clustering and classification is often an important final objective of functional data analysis, but most current techniques rely on a two step approach of first finding the functional basis and then performing clustering or classification based on these functions. In this research we will discuss the challenges and provide directions towards developing a comprehensive functional clustering approach. Applications in Landclass classification using remote sensing data will be presented during the talk.


Le vendredi 24 octobre 2008 / Friday, October 24, 2008 15h30 / 3:30 p.m.

Paul McNicholas, University of Guelph

Model-Based Clustering of Longitudinal Data

McGill University, Burnside Hall 1B39

RESUME/ABSTRACT:

A new family of mixture models for the model-based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are estimated using expectation-maximization (EM) algorithms. The Bayesian information criterion is used for model selection and Aitken's acceleration is used to determine convergence of these EM algorithms. This family of models is then applied to two toy data sets and to to the famous yeast sporulation time course data of Chu et al., where the models display good clustering performance. Finally, further constraints are imposed on the decomposition to allow a deeper investigation of correlation structure of these yeast sporulation data.


Le vendredi 10 octobre 2008 / Friday, October 10, 2008 15h30 / 3:30 p.m.

Pierre-Jérôme Bergeron (Universite d'Ottawa)

Régression et biais de longueur en analyse de durées de vie
Studying the natural history of diseases through prevalent cases: can one exploit untapped features of length-biased data?

UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, Salle / Room PK-5115

Présentation en français avec diapositives en anglais

RESUME/ABSTRACT:

Dans la plupart des analyses de régression, bien que l'échantillonnage se fasse à partir de la loi jointe de la variable d'intérêt et des covariables, l'analyse est effectué en conditionnant sur les valeurs des covariables, parce que la loi marginale des covariables ne contient aucune information sur les paramètres étudiés lorsque l'échantillonnage se fait sans biais. Lorsqu'il y a échantillonnage avec biais de longueur, comme ce peut être le cas en analyse de durées de vie sur des données provenant d'une cohorte prévalente, les covariables souffrent également d'un biais et leur distribution dépend des paramètres de régression. La question se pose: est-il possible d'extraire l'information sur ces paramètres contenue dans la loi marginale des covariables de l'échantillon? En utilisant des méthodes basées sur la vraisemblance pour des données tronquées à gauche et censurées à droite, on démontre que l'on peut obtenir des estimateurs de moindre variance par l'approche jointe (tenant compte des covariables) en comparaison à l'approche conditionnelle. Les résultats sont illustrés avec des données sur la démence provenant de l'Étude canadienne sur la santé et le vieillissement. Les répercussions possibles de ces idées vers d'autre formes d'échantillonnage biaisé et sur l'étude d'évènement récurrents seront discutées si l'horaire le permet.

In standard linear regression, though one samples from the joint distribution of the variable of interest and covariates, the analysis is carried out conditionally because the marginal distribution of the covariates is considered ancillary to the parameters of interest. When sampling is done with length-bias with respect to the response variable, as can be the case with survival data from prevalent cohorts, the covariates are also sampled with a bias. The question is whether the marginal distribution holds any information about the parameters and, if so, should one adapt the usual methods of analysis to account for it? We present an adjusted (joint) likelihood approach for length-biased survival data with left truncation and right censoring and compare it with a conditional approach which ignores the information in the sampling distribution of the covariates. It is shown that taking the covariates into consideration yields more efficient estimates. The methods are applied to data on survival with dementia from the Canadian Study on Health and Aging (CSHA). If time permits, extension of these ideas to data on recurrent events will be addressed.


Le vendredi 3 octobre 2008 / Friday, October 3, 2008 15h30 / 3:30 p.m.

Ranjan Maitra, Iowa State University

Assessing Significance in Finite Mixture Models

Salle / Room LB-921.04, Library Bldg., Conc.U., 1400 de Maisonneuve West

RESUME/ABSTRACT:

Finite mixture models are useful in a wide variety of applications such as astronomy, botany, genetics, medicine and zoology. One appeal for such models is that they provide a convenient model-based statistical framework for clustering. Further, parameter estimation is computationally made feasible by application of the expectation-maximization (EM) algorithm, which can also help in estimating dispersions for these parameter estimates. We have used these estimated dispersions along with first- and second-order multivariate asymptotics to develop approaches to determining significance of various aspects of such models. These include determining the number of significant components in the mixture model, variable selection, quantifying the uncertainty in the derived grouping, and determining significantly influential and outlying observations. In this talk, I will outline development of such methods and illustrate performance on both simulation and classification datasets. This work is joint with Volodymyr Melnykov and is supported in part by the US National Science Foundation under its CAREER grant DMS-0437555.


Le vendredi 26 septembre 2008 / Friday, September 26, 2008 15h30 / 3:30 p.m.

Jon A. Wellner, University of Washington

Testing for sparse normal means: is there a signal?

1B39 Burnside Hall, McGill University

RESUME/ABSTRACT:

Donoho and Jin (2004), following work of Ingster (1999), studied the problem of testing for a signal in a sparse normal means model and showed that there is a ``detection boundary'' above which the signal can be detected and below which no test has any power. They showed that Tukey's ``higher criticism'' statistic achieves the detection boundary. I will introduce a new family of test statistics based on phi-divergences (indexed by a real number s with values between -1 and 2) which all achieve the Donoho-Jin-Ingster detection boundary. I will also briefly review recent work on estimating the proportion of non-zero means.


Trimestre d'hiver 2008 / Winter Semester 2008

Le vendredi 18 avril 2008 / Friday, April 18, 2008 15h30 / 3:30 p.m.

Mary Sara McPeek (University of Chicago)

Genetic Association Studies with Known and Unknown Population Structure

UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, salle PK-5115

RESUME/ABSTRACT:

Common diseases such as asthma, diabetes, and hypertension, which currently account for a large portion of the health care burden, are complex in the sense that they are influenced by many factors, both environmental and genetic. One fundamental problem of interest is to understand what the genetic risk factors are that predispose some people to get a particular complex disease. Technological advances have made it feasible to perform case-control association studies on a genome-wide basis. The observations in these studies can have several sources of dependence, including population structure and relatedness among the sampled individuals, where some of this structure may be known and some unknown. Other characteristics of the data include missing information, and the need to analyze hundreds of thousands or millions of markers in a single study, which puts a premium on computational speed of the methods. We describe a combined approach to these problems which incorporates quasi-likelihood methods for known structure with principal components analysis for unknown structure.

Le vendredi 11 avril 2008 / Friday, April 11, 2008 15h30 / 3:30 p.m.

Yves Atchade, University of Michigan

Bayesian computation for statistical models with intractable normalizing constants

UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340

RESUME/ABSTRACT:

This talk will discuss methods to deal with the problem of sampling from posterior distributions in statistical models with intractable normalizing constants. In the presence of intractable normalizing constants in the likelihood function, traditional MCMC methods cannot be applied. I will review the literature on this issue and present a new general and asymptotically consistent approach to deal with it. I will illustrate the method with examples from image segmentation and social network modeling. Joint work with Nicolas Lartillot and Christian Robert. Technical report available at: http://www.stat.lsa.umich.edu/~yvesa/ncmcmc2.pdf

Le vendredi 4 avril 2008 / Friday, April 4, 2008 15h30 / 3:30 p.m.

Jenny Bryan, University of British Columbia

Statistical methods for high-throughput reverse genetic studies

McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH-708

RESUME/ABSTRACT:

Traditionally in genetics, researchers would identify a remarkable phenotype and then work to uncover the associated genotype; this is called 'forward genetics'. More recently, new techniques have made it possible to systematically make an enormous number of changes to a genome and, for each such change individually, observe the phenotypic consequences. This is what I mean by 'high-throughput reverse genetics' and the best example is the yeast deletion set, a collection of ~6K yeast strains, each of which is characterized by the deletion (or knockout) of a single gene. In other organisms, such as worms and human cells, similar approaches are possible by inhibiting the expression of specific genes, generally through RNA interference (RNAi). I will present some statistical methods appropriate for the analysis of data from high-throughput reverse genetics studies, with some coverage of low-level issues, such as normalization, and high-level analyses, such as clustering and growth curve modelling on a large scale. This talk will probably appeal most to people with an existing background or interest in genomics, particularly gene expression data, and who are interested in hearing about other platforms for the genome-scale study of gene function.

Le vendredi 28 mars 2008 / Friday, March 28, 2008 15h30 / 3:30 p.m.

Stephan Morgenthaler, Chaire de statistique appliquée Ecole polytechnique fédérale de Lausanne, USA

Modéliser la forme d'une distribution

HEC, 3000, chemin de la Côte-Sainte-Catherine, salle Cogeco (1er étage, section bleue)

RESUME/ABSTRACT:

Comment peut-on décrire la déviation de la normalité d'une loi de probabilité ? Ceci est une question avec de profondes racines historiques. Nous tous connaissons le diagramme normale comme outil pratique pour juger la question de normalité d'une v.a. X. La formalisation mathématique de ce diagramme passe par l'écriture de X en fonction de Z, une v.a. normale centrée et réduite. Le dévelopement de Cornish et Fisher ainsi que les transformations g/h de John Tukey sont de ce type. Nous allons expliquer ces méthodes et discuter de quelques applications.

The question of how to describe non-normal distributions is a statistical problem with deep historical roots. A powerful graphical tool for detecting non-normality of a random variable X is the normal probability plot. Its natural generalization consists in writing X as a function of a unit Gaussian variable Z. The Cornish-Fisher expansion is of this type, as are John W. Tukey's g-h-transformations. We will explain these methods and discuss some applications.

Les transparents seront en anglais et la conférence sera donnée en français.

Le vendredi 14 mars 2008 / Friday, March 14, 2008 15h30 / 3:30 p.m.

J. Steve Marron, University of North Carolina, USA

Object Oriented Data Analysis

Concordia, Library Building, 1400 de Maisonneuve O., salle LB-921.04

RESUME/ABSTRACT:

Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.

Le vendredi 7 mars 2008 / Friday, March 7, 2008 15h30 / 3:30 p.m.

Radu Craiu, University of Toronto

Learn from Thy Neighbour: Parallel-Chain Adaptive MCMC

McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH-708

RESUME/ABSTRACT:

A considerable amount of effort has been recently invested in developing a comprehensive theory for adaptive MCMC. In comparison, there are fewer adaptive algorithms designed for practical situations. I will review some of the theoretical approaches used for proving convergence of non-Markovian adaptation schemes and will discuss scenarios for which the original adaptive Random-Walk Metropolis is unsuitable. Alternative adaptive schemes involving inter-chain and regional adaptation are discussed. Some of the proposed solutions involve theoretical questions that are still open.

Le vendredi 29 février 2008 / Friday, February 29, 2008 15h30 / 3:30 p.m.

Matthew Stephens, University of Chicago

Bayesian Imputation-based Association Mapping

UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, salle PK-5115

RESUME/ABSTRACT:

Ongoing large-scale genetic association studies, in an attempt to identify variants and genes affecting susceptibility to common diseases, are typing hundreds of thousands of SNPs in thousands of individuals, and testing these SNPs for association with phenotypes. Although this is a large number of SNPs, an even larger number of SNPs remain untyped. For example, the International HapMap Project contains genotype data on more than 3 million SNPs, many of which will not be typed in current studies. In this talk we will describe an approach that allows these untyped SNPs to be tested for association with phenotype. The basic idea is to exploit the fact that untyped SNPs are often correlated with typed SNPs, so genotype data on typed SNPs can be used to indirectly test untyped SNPs for association with phenotypes. Specifically, our approach exploits available information about patterns of correlation among typed and untyped SNPs in a panel of densely-genotyped individuals (e.g. the HapMap samples) to explicitly predict, or "impute", the genotypes at untyped SNPs in a study sample, and then tests these imputed genotypes for association with a phenotype. By using Bayesian statistical methods we are able to take account of potential errors in these imputed genotypes. We illustrate the benefits of this approach in terms of both gain in power, and improved interpretability of association signals, particularly when comparing results across studies that have typed different SNP markers.

Le vendredi 22 février 2008 / Friday, February 22, 2008 15h30 / 3:30 p.m.

Ayesha Ali, University of Guelph

Equivalence Class Searches Across Directed Acyclic Graphs with and without Latent Variables

McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH 708

RESUME/ABSTRACT:

Graphical models are graphs with vertices (variables) and edges that encode the conditional independence relations holding among the set of variables of some process. Directed acyclic graphs (DAGs) are commonly used to represent processes in (not exclusively) the biological, econometric, and social sciences. However, there are often many graphs that can encode the same set of conditional independence relations, thus forming a Markov equivalence class. Furthermore, the likelihoods of Markov equivalent graphs are equal. Hence, when performing a model search, it may be more efficient to search across equivalence classes rather than across individual graphs. In this talk we will review how equivalence classes of DAG models are represented and present an equivalence class search across such graphs. We will then focus on situations where some of the variables in the process are latent, and discuss how to represent Markov equivalence classes in this setting.

Le vendredi 15 février 2008 / Friday, February 15, 2008 15h30 / 3:30 p.m.

Jason D. Nielsen, Carleton University

Adaptive Functional Models for the Analysis of Recurrent Event Panel Data

UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 6214

RESUME/ABSTRACT:

An adaptive semi-parametric model for analyzing longitudinal panel count data is presented. Panel data refers here to data collected as the number of events occurring between specific follow-up times over a period of observation of a subject. The counts are assumed to arise from a mixed non-homogeneous Poisson process where frailties account for heterogeneity common to this type of data. The generating intensity of the counting process is assumed to be a smooth function modeled with penalized splines. A main feature is that the penalization used to control the amount of smoothing, usually assumed to be time homogeneous, is allowed to be time dependent so that the spline can more easily adapt to sharp changes in curvature regimes. The finite sample properties of the proposed estimating functions are investigated and comparisons made with a simpler model assuming a time homogeneous penalty.

Le vendredi 8 février 2008 / Friday, February 8, 2008 15h30 / 3:30 p.m.

Chris Paciorek, Havard School of Public Health

Mapping Ancient Forests: Bayesian Inference for Spatio-temporal Trends in Forest Composition Using the Fossil Pollen Proxy Record

McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room: BH 708

RESUME/ABSTRACT:

Ecologists are interested in understanding changes in tree species abundances and spatial distributions over thousands of years since the last glacial maximum. To estimate forest composition and investigate how much information is available from fossil pollen deposited in lake sediments, we build a Bayesian spatio-temporal hierarchical model that predicts forest composition in southern New England, USA, based on fossilized pollen. The critical relationships between abundances of taxa in the pollen record and abundances in actual vegetation are estimated using modern data and data from colonial records, for which both pollen and direct vegetation data are available. For these time periods, the Bayesian model relates pollen and vegetation data to a latent multivariate spatial process representing forest composition, which allows estimation of several key parameters. For time periods in the past, we use only pollen data and the estimated model parameters to make predictions and assess uncertainty about the latent spatio-temporal process over the last 2000 years. A new graphical assessment of feature significance allows us to infer which spatial patterns are reliably estimated.


Pour de plus amples informations :
					activites@CRM.UMontreal.CA
         For further information:

[Page dŽaccueil du CRM]
webmestre@CRM.UMontreal.CA