Le vendredi 1er mai 2009 / Friday, May 1st, 2009
15h30 / 3:30 p.m.
Constantine Frangakis (Johns Hopkins University)
The role of principal stratification in instrumental variables
in case-control designs - an application to Mendelian
randomization
McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36
RESUME/ABSTRACT:
We are motivated by studying the effect that the expression of inflammatory genes
has on the risk of colorectal cancer. Gene expression, however, is likely
confounded with other risk factors for cancer. But because meiosis within families
is considered a random process, the genotypes can potentially be used as
instruments for the actual inflammation levels. The problem we address here is that
designs are typically based on case-control sampling for these settings. We show
first that, in contrast to settings with no confounding, modeled with conditional
logistic regression, instrumental variables causal effects are generally
incorrectly estimated if the design effect is ignored, as they are not invariant
under such designs. We show, second, how in general the framework of principal
stratification is useful to validly estimate the causal effects under such designs.
We demonstrate these results with the effect of inflammation on colorectal cancer.
Le vendredi 24 avril 2009 / Friday, April 24, 2009
15h30 / 3:30 p.m.
Jinko Graham (Simon Fraser University)
Graphical Displays to Uncover Gene-environment Interaction
from Data on Case-parent Trios
UQAM, Pav. Président-Kennedy, 201, av. Président-Kennedy, salle PK-5115
RESUME/ABSTRACT:
In genetic association studies of complex diseases, case-parent-trio designs
involve the collection of data from affected offspring and their parents. This
design is well-suited to diseases of early onset, such as type 1 diabetes and
childhood leukemia. Unlike the case-control design, the case-parent design is
robust to bias from ethnic differences between cases and controls and it enables
investigation of parent-of-origin effects for genetic risk factors. While the use
of the case-parent design for finding genetic associations has been well studied,
its use for uncovering gene-environment interactions is less well-understood. We
review two existing ad-hoc approaches to explore gene-environment interaction from
case-parent trios and illustrate their potential bias. We propose an alternate
penalized likelihood approach that does not suffer from such bias and illustrate
its use on simulated data. We conclude with some directions for future research.
This is joint work with Ji-Hyung Shin and Brad McNeney.
Le vendredi 17 avril 2009 / Friday, April 17, 2009
15h30 / 3:30 p.m.
David Dunson (Duke)
Bayesian Density Regression with Epidemiology Applications
McGill, Burnside Hall, 805 Sherbrooke O., Salle 1B24
RESUME/ABSTRACT:
In assessing relationships between a response and multiple predictors,
it is appealing to allow the conditional response distribution to vary
flexibly, allowing non-linear and varying relationships with the
different quantiles and predictors. Such flexibility is of critical
importance in applications in which the tails of the distribution are
of primary interest. For example, in epidemiology studies of
continuous health responses, the tails of the distribution
typically correspond to those individuals having the most adverse health
conditions. We would like a method that can allow an environmental
exposure, genetic factor or demographic covariate to flexibly impact
risk of an adverse response, with adverse corresponding to values in
the tails of the distribution. Values further in the tails vary in
their severity, so it is important to avoid categorization or
grouping. Motivated by studies of pregnancy outcomes and
premature delivery, this talk proposes Bayesian nonparametric methods for
density regression. I will also describe applications to molecular
epidemiology studies. The talk is designed to be accessible to a
general audience of biostatisticians and epidemiologists, so technical
details will kept to a minimum.
Le vendredi 3 avril 2009 / Friday, April 3, 2009
15h30 / 3:30 p.m.
Mary Lesperance (Victoria)
Testing for Benford's Law and Possible Fraud Detection
CRM, UdeM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 6214
RESUME/ABSTRACT:
Recent high profile accounting scandals have revealed the need for automated
methods which can quickly analyze large amounts of financial data and signal
when unusual observations are present. The literature suggests that many
financial (and other) data sets conform to the first digit frequency
distribution known as Benford's Law. In this paper, various methods of
testing whether observed frequencies of first significant digits agree with
Benford's law are presented and compared in terms of their power.
Theoretical and empirical results are used to compare these methods. Some
recommendations are given on how these procedures may be employed in the
field of accounting to detect unusual observations, fraud or error.
Le vendredi 27 mars 2009 / Friday, March 27, 2009
15h30 / 3:30 p.m.
Lei Sun (University of Toronto)
Unifying Stratified and Weighted FDR Methods with Applications to Large-Scale Genetic Studies
Concordia University, salle LB 921-04
RESUME/ABSTRACT:
A central issue in high-dimensional genetic studies is how to assess
statistical significance taking into account the inherent large-scale
multiple hypothesis testing. To improve power, a number of studies have
investigated the benefits of utilizing available prior information,
however, the relative merits of different methods remain unknown. We focus
on the stratified FDR (Sun et al., 2006) and weighted FDR (Genovese et
al., 2006; Roeder et al., 2006) control methods. The two approaches model
the prior distinctively. Weighted FDR converts the available prior
information to test-specific weighting factor and adjusts the p-values
accordingly. In contrast, stratified FDR divides tests into several
disjoint strata based on the prior information and applies FDR control
separately in each stratum. We first unify the two approaches in one
framework and we show the trade-off between power and robustness by
theoretical, simulation, and application studies. Robustness is desirable
to safeguard against potential uninformative or even misleading prior
information. We demonstrate the practical relevance by applying the two
methods to three genome-wide association studies on diabetes and
diabetes-related complications using previous genome-wide linkage results
as the available prior information. This is joint work with Yun Joo Yoo,
Shelley Bull, Andrew Paterson and Daryl Waggott.
Le vendredi 20 mars 2009 / Friday, March 20, 2009
15h30 / 3:30 p.m.
Susan Shortreed (McGill University)
Learning in Spectral Clustering
McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36
RESUME/ABSTRACT:
Spectral clustering is a pairwise clustering technique that uses the eigenvectors and eigenvalues of a normalized similarity matrix to cluster the data. While it is a popular clustering method, a limiting factor in spectral segmentation is that the similarity matrix is not usually known a priori. In this talk we will review spectral clustering and present our method for learning the similarity matrix. We introduce the idea of optimizing a cost function composed of clustering quality term, the gap, regularized by a clustering stability term, the eigengap. We will present our supervised learning methods in detail, which assumes that a training set with known clustering labels is available for learning the similarity matrix. We will also discuss how we can extend our methodology to the unsupervised and semi-supervised frameworks.
Le vendredi 13 mars 2009 / Friday, March 13, 2009
15h30 / 3:30 p.m.
Fernando Camacho (Damos Inc., Toronto)
Statistical Analysis for Life Cycle Management of Steam Generators
Concordia University, LB-921.04, Library Bldg., 1400 de Maisonneuve West
RESUME/ABSTRACT:
As equipment and systems age, Life Cycle Management (LCM) analysis becomes an important tool in assessing and managing potentially life limiting degradation mechanisms. Adequate LCM analysis usually considers a range of inspection and mitigation strategies aiming to maintain or extend the technical and economic life of the equipment. The assessment of these strategies needs to reflect not only the deterioration rate of the equipment, but also the impact the mitigation strategies have on the equipment. Deterioration rates can be assessed based on historical inspection trends, but in general it is much harder to assess how different mitigation strategies may affect the deterioration. This talk will describe some of the statistical analyses carried out to develop models that could be used on the LCM of steam generators of nuclear reactors. In particular, we will discuss the data collection, parameter estimation and variable selection used to select a model suitable to assess the effect of different mitigation actions on the deterioration rate of tube pitting in the steam generators.
(Joint work with Sandra Pagan, Ontario Power Generation Inc., Pickering, CANADA)
Le vendredi 6 mars 2009 / Friday,March 6, 2009
15h30 / 3:30 p.m.
Román Viveros-Aguilera (McMaster University)
Quality Control in Health Care
McGill, Burnside Hall, 805 Sherbrooke O., salle 1B36
RESUME/ABSTRACT:
While many types of health care processes and services are similar to those in other organizations, others such as medical interventions show notable differences. For instance, in industrial applications where quality control methods have a long history, the units sampled are products with a high degree of homogeneity as they are manufactured under largely controlled conditions. By contrast, patients subject to medical procedures exhibit extensive variety in their health profiles. This calls for new monitoring methods or major adjustments to existing industrial quality control methods to make them effective in the new situations. Risk-adjustment is a term recently introduced to describe some of the adaptations. In this talk we examine the issues, discuss some of the challenges as well as some of the solutions. The technical elements will be kept at a low volume.
*****ATTENTION: Cette conférence est annulée / This conference has been cancelled ****
Le vendredi 27 février 2009 / Friday, February 27, 2009
15h30 / 3:30 p.m.
Sayan Mukherjee (Duke University)/Dept of Statistical Science; Dept of Computer Science; Institute for Genome Sciences & Policy; Dept of Biostatistics and Bioinformatics
Joint work with Justin Guinney, Simon Lunagomez, Mauro Maggioni, Robert Wolpert, and Phillip Febbo
Two Representations of Graphical Models
CRM , Amphi 5340
Pause café / Coffee break salle/room 4361
RESUME/ABSTRACT:
In this talk I will discuss two problems: decomposition of gene networks and inference of conditional dependencies.
The first part of the talk describes a method to decompose pathways or gene networks into sub-networks and infer the relevance of these sub-components in explaining phenotypic variation. The approach which we call multiscale graphical models is strongly related to old ideas such as path analysis. Specifically, it is based on the idea of diffusion wavelets which in our application is a multiscale decomposition of a partial correlation matrix or the generator of a Markov chain. We describe
results on yeast gene expression data to illustrate the method and then provide preliminary data on prostate cancer.
The second part of the talk formulates a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, an informative prior distribution is placed over decomposable graphs and the induced posterior distribution is sampled. The key idea developed is a parametrization of decomposable hypergraphs using the geometry of points in Euclidean space. This allows for specification of informative priors on decomposable graphs by priors on a finite set of points. This construction has been well studied in the fields of computational topology and random geometric graphs.
Le vendredi 20 février 2009 / Friday, February 20, 2009
15h30 / 3:30 p.m.
Marina Meila (University of Washington)
Consensus Ranking under the Exponential Model
CRM, salle 6214
RESUME/ABSTRACT:
This talk is concerned with summarizing -- by means of statistical models -- of data that expresses preferences. This data is typically a
set of rankings of n items by a panel of experts; the simplest summary is the "consensus ranking", or the "centroid" of the set of rankings. Such problems appear in many tasks, ranging from combining voter preferences to boosting of search engines.
We study the problem in its more general form of estimating a parametric model over permutations, known as the Generalized Mallows (GM) model. The talk will present a new exact estimation algorithm, non-polynomial in theory, but extremely effective in comparison with existing algorithms. From a statistical point of view, we show that the GM model is an exponential family, and introduce the conjugate
prior for this model class.
Then we introduce the infinite GM model, corresponding to "rankings" over an infinite set of items, and show that this model is both elegant and of practical significance. Finally, the talk will touch upon the subject of multimodal distributions and clustering.
Joint work with: Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson and Jeff Bilmes
Le vendredi 13 février 2009 / Friday, February 13, 2009
15h30 / 3:30 p.m.
Thomas A. Louis (Johns Hopkins Bloomberg School of Public Health)
Trend Tests that Accommodate Genotyping Errors
McGill University Burnside Hall, 805 Sherbrooke O., Salle/Room 1B36
RESUME/ABSTRACT:
High-throuput SNP arrays provide estimates of genotypes for up to one million loci. These estimates
are used, for example, in genome-wide association studies that relate genotype and phenotype (e.g.,
disease) for a sample of individuals. Common practice is to rank SNPs using test statistics, p-
values or Bayesian structuring. While genotype calls are typically very accurate, genotyping errors
do occur and these can greatly influence statistical analysis of genotype/phenotype associations.
However, estimates of genotype uncertainty are available for some platforms. Currently, they are
used to identify, for each individual, SNPs with a sufficiently uncertain call. These are set aside in
evaluating associations. This approach unnecessarily reduces information and can be biased. As
an improvement, we derive and study a trend test test statistic for genotype/phenotype association
that takes genotype uncertainty into account, thus avoiding the need to set-aside uncertain SNPs
and thereby making best use of available information.
Using simulations informed by the HapMap dataset, we show the effectiveness of this approach
compared to setting aside uncertain genotype calls and to making deterministic calls. Effective-
ness depends on an accurate assessment of uncertainty; with accurate assessment the approach
can substantially improve identification of causal SNPs. In addition, we present a mathematical
representation that reduces the need for simulation to assess performance in identifying a single,
causal SNP in the context of a large number of comparator SNPs.
Le vendredi 6 février 2009 / Friday, February 6, 2009
15h30 / 3:30 p.m.
Taoufik Bouezmarni (Université de Montréal)
A Nonparametric Test for Conditional Independence using Bernstein Density Copulas
UQAM, 201, ave Président Kennedy, Salle PK-5115
RESUME/ABSTRACT:
This paper proposes a new nonparametric test for conditional independence which is based on the comparison of Bernstein copula densities using the Hellinger distance. The test is easy to implement
because it does not involve a weighting function in the test statistic, and it can be applied in general settings since there is no restriction on the dimension of the data. We proof that the test
statistic is asymptotically pivotal under the null hypothesis, establish local power properties, and motivate the validity of the bootstrap technique that we use in finite sample settings. A
simulation study illustrates the good size and power properties of the test. We illustrate the empirical relevance of our test by focusing
on Granger non-causality using financial data to test for nonlinear leverage versus volatility feedback effects.
Le vendredi 30 janvier 2009 / Friday, January 30, 2009
15h30 / 3:30 p.m.
Christian Robert (CEREMADE - Université Paris-Dauphine)
Computational Approaches to Bayesian Model Choice
CRM, UdeM, Pav. André-Aisenstadt, 2920 ch. de la Tour, salle 6214
RESUME/ABSTRACT:
n this talk, we will cover recent developments of ours and of others in the computation of marginal distributions for the comparison of statistical models in a Bayesian framework. While the introduction of reversible jump MCMC by Green in 1995 is rightly perceived as the 'second MCMC revolution,' its implementation is often too complex for the problems at stake. When the number of models under study is of a reasonable magnitude, there exist computational alternatives that avoid model exploration with a reasonable efficiency and we will discuss here the pros and cons of several of those methods.
Joint work with Jean-Michel MARIN, Université Montpelliers 2, Orsay, and Nicolas CHOPIN, CREST-INSEE.
Le vendredi 23 janvier 2009 / Friday, January 23, 2009
15h30 / 3:30 p.m.
Andreas Kyprianou (The University of Bath)
Refracted Levy Processes
Concordia University, LB-921.04, Library Bldg., 1400 de Maisonneuve West
RESUME/ABSTRACT:
We discuss solutions to a very elementary, but none the less degenerate, SDE which describes the aggregate path of a Levy process when is perturbed by a linear drift every time it spends time above a fixed level. Despite the simple nature of the SDE, some work is required to establish existence and uniqueness of a solution. This problem is put in context by an application in insurance mathematics.
Le vendredi 5 décembre 2008 / Friday, December 5, 2008
15h30 / 3:30 p.m.
Peter Hoff, University of Washington
Hierarchical Eigenmodels for Pooled Covariance Estimation
UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340
RESUME/ABSTRACT:
While a set of covariance matrices corresponding to different populations are unlikely to be exactly equal, they can still exhibit a high degree of similarity. For example, some pairs of variables may be positively correlated across most groups, while other pairs may be consistently negative. In such cases the similarities across covariance matrices can be described by similarities in their principal axes, the axes defined by the eigenvectors of the covariance matrices. Estimating the degree of across-population eigenvector heterogeneity can be helpful for a variety of estimation tasks. Similar eigenvector matrices can be pooled to form a central set of principal axes, and covariance estimation for populations having small sample sizes can be stabilized by shrinking estimates of their population-specific principal axes towards the across-population center. To this end, in this talk we'll discuss a hierarchical model and estimation procedure for pooling principal axes across several populations. The model for the across-group heterogeneity is based on a matrix valued antipodally symmetric Bingham distribution that can flexibly describe notions of center and spread for a population of orthonormal matrices.
Le vendredi 21 novembre 2008 / Friday, November 21, 2008
15h30 / 3:30 p.m.
Duncan Murdoch, University of Western Ontario
Two Recursive Simulation Schemes
McGill University, Burnside Hall, 805 Sherbrooke O., 1B39
RESUME/ABSTRACT:
In this talk I will present two recursive simulation schemes. In the
first, our aim is to do exact simulations of functionals of diffusion
solutions of stochastic differential equations: the times of events
such as extremes and barrier crossings, or multivariate outcomes such
as the joint times and values of the minimum and maximum.
The second scheme is an adaptive rejection sampler targeted at
relatively high dimensional densities. Using recursive partitioning
and proposals which are locally independent in each component we
construct samplers with high acceptance rates.
This is joint work with Tingting Gou and John Braun.
Le vendredi 7 novembre 2008 / Friday, November 7, 2008
15h30 / 3:30 p.m.
Peter McCullagh, University of Chicago
Sampling bias in logistic models
McGill University, Burnside Hall, 805 Sherbrooke O., 1B39
RESUME/ABSTRACT:
This talk is concerned with regression models for the effect of covariates on correlated binary and correlated polytomous responses. In a generalized linear mixed model, correlations are induced by a random effect, additive on the logistic scale, so that the joint distribution $p_{\bfx}(\bfy)$ obtained by integration depends on the covariate values $\bfx$ on the sampled units. The thrust of this talk is that the conventional formulation is inappropriate for most natural sampling schemes in which the sampled units arise from a random process. The conventional analysis incorrectly predicts parameter attenuation due to the random effect, thereby giving a misleading impression of the magnitude of treatment effects. The error in the conventional analysis is a subtle consequence of selection bias that arises from random sampling of units. This talk will describe a non-standard but mathematically natural formulation in which the units are auto-generated by an explicit process and sampled following a well-determined plan. For a quota sample in which the covariate configuration $\bfx$ is pre-specified, the model distribution coincides with $p_{\bfx}(\bfy)$ in the GLMM. However, if the sample units are selected at random, either by sequential recruitment or by simple random sampling from the available population, the conditional distribution $p(\bfy \given \bfx)$ is different from $p_\bfx(\bfy)$. By contrast with conventional models, conditioning on~$\bfx$ is not equivalent to stratification by~$\bfx$. The implications for likelihood computations and estimating equations will be discussed.
Le vendredi 31 octobre 2008 / Friday, October 31, 2008
15h30 / 3:30 p.m.
Surajit Ray, Boston University
Clustering and classification of functional data
UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340
RESUME/ABSTRACT:
Functional approaches to modeling dynamics of biological systems, trends in financial cycle, seasonal measurements of spectral bands in remote sensing, are becoming increasingly popular as a data analysis tool. On the other hand a recent approach aims at reducing the dimension of large $p$ small $n$ problems using a functional embedding of the p-dimensional vector (Talk by Hans-George Muller at JSM 2008). Clustering and classification is often an important final objective of functional data analysis, but most current techniques rely on a two step approach of first finding the functional basis and then performing clustering or classification based on these functions. In this research we will discuss the challenges and provide directions towards developing a comprehensive functional clustering approach. Applications in Landclass classification using remote sensing data will be presented during the talk.
Le vendredi 24 octobre 2008 / Friday, October 24, 2008
15h30 / 3:30 p.m.
Paul McNicholas, University of Guelph
Model-Based Clustering of Longitudinal Data
McGill University, Burnside Hall 1B39
RESUME/ABSTRACT:
A new family of mixture models for the model-based clustering of longitudinal data is introduced. The covariance structures of eight members of this new family of models are given and the associated maximum likelihood estimates for the parameters are estimated using expectation-maximization (EM) algorithms. The Bayesian information criterion is used for model selection and Aitken's acceleration is used to determine convergence of these EM algorithms. This family of models is then applied to two toy data sets and to to the famous yeast sporulation time course data of Chu et al., where the models display good clustering performance. Finally, further constraints are imposed on the decomposition to allow a deeper investigation of correlation structure of these yeast sporulation data.
Le vendredi 10 octobre 2008 / Friday, October 10, 2008
15h30 / 3:30 p.m.
Pierre-Jérôme Bergeron (Universite d'Ottawa)
Régression et biais de longueur en analyse de durées de vie Studying the natural history of diseases through prevalent cases: can one exploit untapped features of length-biased data?
UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, Salle / Room PK-5115
Présentation en français avec diapositives en anglais
RESUME/ABSTRACT:
Dans la plupart des analyses de régression, bien que l'échantillonnage se fasse
à partir de la loi jointe de la variable d'intérêt et des covariables, l'analyse
est effectué en conditionnant sur les valeurs des covariables, parce que la loi
marginale des covariables ne contient aucune information sur les paramètres étudiés
lorsque l'échantillonnage se fait sans biais. Lorsqu'il y a échantillonnage avec
biais de longueur, comme ce peut être le cas en analyse de durées de vie sur des
données provenant d'une cohorte prévalente, les covariables souffrent également
d'un biais et leur distribution dépend des paramètres de régression. La question
se pose: est-il possible d'extraire l'information sur ces paramètres contenue
dans la loi marginale des covariables de l'échantillon? En utilisant des méthodes basées
sur la vraisemblance pour des données tronquées à gauche et censurées à droite,
on démontre que l'on peut obtenir des estimateurs de moindre variance
par l'approche jointe (tenant compte des covariables) en comparaison à l'approche
conditionnelle. Les résultats sont illustrés avec des données sur la démence provenant
de l'Étude canadienne sur la santé et le vieillissement. Les répercussions possibles
de ces idées vers d'autre formes d'échantillonnage biaisé et sur l'étude
d'évènement récurrents seront discutées si l'horaire le permet. In standard linear regression, though one samples from the joint
distribution of the variable of interest and covariates, the analysis is
carried out conditionally because the marginal distribution of the covariates
is considered ancillary to the parameters of interest. When sampling is
done with length-bias with respect to the response variable, as can be the
case with survival data from prevalent cohorts, the covariates are also
sampled with a bias. The question is whether the marginal distribution holds
any information about the parameters and, if so, should one adapt the usual
methods of analysis to account for it? We present an adjusted (joint)
likelihood approach for length-biased survival data with left truncation and
right censoring and compare it with a conditional approach which ignores the
information in the sampling distribution of the covariates. It is shown that taking the
covariates into consideration yields more efficient estimates. The methods
are applied to data on survival with dementia from the Canadian Study on
Health and Aging (CSHA). If time permits, extension of these ideas to data
on recurrent events will be addressed.
Le vendredi 3 octobre 2008 / Friday, October 3, 2008
15h30 / 3:30 p.m.
Ranjan Maitra, Iowa State University
Assessing Significance in Finite Mixture Models
Salle / Room LB-921.04, Library Bldg., Conc.U., 1400 de Maisonneuve West
RESUME/ABSTRACT:
Finite mixture models are useful in a wide variety of
applications such as astronomy, botany, genetics, medicine and zoology.
One appeal for such models is that they provide a convenient
model-based statistical framework for clustering. Further, parameter
estimation is computationally made feasible by application of the
expectation-maximization (EM) algorithm, which can also help in
estimating dispersions for these parameter estimates. We have used
these estimated dispersions along with first- and second-order
multivariate asymptotics to develop approaches to determining significance of
various aspects of such models. These include determining the number
of significant components in the mixture model, variable selection,
quantifying the uncertainty in the derived grouping, and determining
significantly influential and outlying observations. In this talk, I
will outline development of such methods and illustrate performance on
both simulation and classification datasets.
This work is joint with Volodymyr Melnykov and is supported in part by
the US National Science Foundation under its CAREER grant DMS-0437555.
Le vendredi 26 septembre 2008 / Friday, September 26, 2008
15h30 / 3:30 p.m.
Jon A. Wellner, University of Washington
Testing for sparse normal means: is there a signal?
1B39 Burnside Hall, McGill University
RESUME/ABSTRACT:
Donoho and Jin (2004), following work of Ingster (1999), studied the problem of testing for a signal in a sparse normal means model and showed that there is a ``detection boundary'' above which the signal can be detected and below which no test has any power. They showed that Tukey's ``higher criticism'' statistic achieves the detection boundary. I will introduce a new family of test statistics based on phi-divergences (indexed by a real number s with values between -1 and 2) which all achieve the Donoho-Jin-Ingster detection boundary. I will also briefly review recent work on estimating the proportion of non-zero means.
Trimestre d'hiver 2008 / Winter Semester 2008
Le vendredi 18 avril 2008 / Friday, April 18, 2008
15h30 / 3:30 p.m.
Mary Sara McPeek (University of Chicago)
Genetic Association Studies with Known and Unknown Population Structure
UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, salle PK-5115
RESUME/ABSTRACT:
Common diseases such as asthma, diabetes, and hypertension,
which currently account for a large portion of the health
care burden, are complex in the sense that they are influenced
by many factors, both environmental and genetic. One fundamental
problem of interest is to understand what the genetic risk factors are
that predispose some people to get a particular complex disease.
Technological advances have made it feasible to perform case-control
association studies on a genome-wide basis. The observations in these
studies can have several sources of dependence, including population
structure and relatedness among the sampled individuals, where some of
this structure may be known and some unknown. Other characteristics of
the data include missing information, and the need to analyze hundreds
of thousands or millions of markers in a single study, which puts a
premium on computational speed of the methods. We describe a combined
approach to these problems which incorporates quasi-likelihood methods
for known structure with principal components analysis for unknown
structure.
Le vendredi 11 avril 2008 / Friday, April 11, 2008
15h30 / 3:30 p.m.
Yves Atchade, University of Michigan
Bayesian computation for statistical models with intractable normalizing constants
UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 5340
RESUME/ABSTRACT:
This talk will discuss methods to deal with the problem of sampling from posterior distributions in statistical models with intractable normalizing constants. In the presence of intractable normalizing constants in the likelihood function, traditional MCMC methods cannot be applied. I will review the literature on this issue and present a new general and asymptotically consistent approach to deal with it. I will illustrate the method with examples from image segmentation and social network modeling.
Joint work with Nicolas Lartillot and Christian Robert.
Technical report available at: http://www.stat.lsa.umich.edu/~yvesa/ncmcmc2.pdf
Le vendredi 4 avril 2008 / Friday, April 4, 2008
15h30 / 3:30 p.m.
Jenny Bryan, University of British Columbia
Statistical methods for high-throughput reverse genetic studies
McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH-708
RESUME/ABSTRACT:
Traditionally in genetics, researchers would identify a remarkable phenotype and then work to uncover the associated genotype; this is called 'forward genetics'. More recently, new techniques have made it possible to systematically make an enormous number of changes to a genome and, for each such change individually, observe the phenotypic consequences. This is what I mean by 'high-throughput reverse genetics' and the best example is the yeast deletion set, a collection of ~6K yeast strains, each of which is characterized by the deletion (or knockout) of a single gene. In other organisms, such as worms and human cells, similar approaches are possible by inhibiting the expression of specific genes, generally through RNA interference (RNAi). I will present some statistical methods appropriate for the analysis of data from high-throughput reverse genetics studies, with some coverage of low-level issues, such as normalization, and high-level analyses, such as clustering and growth curve modelling on a large scale. This talk will probably appeal most to people with an existing background or interest in genomics, particularly gene expression data, and who are interested in hearing about other platforms for the genome-scale study of gene function.
Le vendredi 28 mars 2008 / Friday, March 28, 2008
15h30 / 3:30 p.m.
Stephan Morgenthaler, Chaire de statistique appliquée
Ecole polytechnique fédérale de Lausanne, USA
Modéliser la forme d'une distribution
HEC, 3000, chemin de la Côte-Sainte-Catherine, salle Cogeco (1er étage, section bleue)
RESUME/ABSTRACT:
Comment peut-on décrire la déviation de la normalité d'une loi de probabilité ? Ceci est une question avec de profondes racines historiques. Nous tous connaissons le diagramme normale comme outil pratique pour juger la question de normalité d'une v.a. X. La formalisation mathématique de ce diagramme passe par l'écriture de X en fonction de Z, une v.a. normale centrée et réduite. Le dévelopement de Cornish et Fisher ainsi que les transformations g/h de John Tukey sont de ce type. Nous allons expliquer ces méthodes et discuter de quelques applications.
The question of how to describe non-normal distributions is a statistical problem with deep historical roots. A powerful graphical tool for detecting non-normality of a random variable X is the normal probability plot. Its natural generalization consists in writing X as a function of a unit Gaussian variable Z. The Cornish-Fisher expansion is of this type, as are John W. Tukey's g-h-transformations. We will explain these methods and discuss some applications.
Les transparents seront en anglais et la conférence sera donnée en français.
Le vendredi 14 mars 2008 / Friday, March 14, 2008
15h30 / 3:30 p.m.
J. Steve Marron, University of North Carolina, USA
Object Oriented Data Analysis
Concordia, Library Building, 1400 de Maisonneuve O., salle LB-921.04
RESUME/ABSTRACT:
Object Oriented Data Analysis is the statistical analysis of populations of complex objects. In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful. Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects. These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics. Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.
Le vendredi 7 mars 2008 / Friday, March 7, 2008
15h30 / 3:30 p.m.
Radu Craiu, University of Toronto
Learn from Thy Neighbour: Parallel-Chain Adaptive MCMC
McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH-708
RESUME/ABSTRACT:
A considerable amount of effort has been recently invested in developing a comprehensive theory for adaptive MCMC. In comparison, there are fewer adaptive algorithms designed for practical situations.
I will review some of the theoretical approaches used for proving convergence of non-Markovian adaptation schemes and will discuss scenarios for which the original adaptive Random-Walk Metropolis is unsuitable. Alternative adaptive schemes involving inter-chain and regional adaptation are discussed. Some of the proposed solutions involve theoretical questions that are still open.
Le vendredi 29 février 2008 / Friday, February 29, 2008
15h30 / 3:30 p.m.
Matthew Stephens, University of Chicago
Bayesian Imputation-based Association Mapping
UQAM, Pav. Président-Kennedy, 200, av. Président-Kennedy, salle PK-5115
RESUME/ABSTRACT:
Ongoing large-scale genetic association studies, in an attempt to identify variants and genes affecting susceptibility to common diseases, are typing hundreds of thousands of SNPs in thousands of individuals, and testing these SNPs for association with phenotypes. Although this is a large number of SNPs, an even larger number of SNPs remain untyped. For example, the International HapMap Project contains genotype data on more than 3 million SNPs, many of which will not be typed in current studies. In this talk we will describe an approach that allows these untyped SNPs to be tested for association with phenotype. The basic idea is to exploit the fact that untyped SNPs are often correlated with typed SNPs, so genotype data on typed SNPs can be used to indirectly test untyped SNPs for association with phenotypes. Specifically, our approach exploits available information about patterns of correlation among typed and untyped SNPs in a panel of densely-genotyped individuals (e.g. the HapMap samples) to explicitly predict, or "impute", the genotypes at untyped SNPs in a study sample, and then tests these imputed genotypes for association with a phenotype. By using Bayesian statistical methods we are able to take account of potential errors in these imputed genotypes. We illustrate the benefits of this approach in terms of both gain in power, and improved interpretability of association signals, particularly when comparing results across studies that have typed different SNP markers.
Le vendredi 22 février 2008 / Friday, February 22, 2008
15h30 / 3:30 p.m.
Ayesha Ali, University of Guelph
Equivalence Class Searches Across Directed Acyclic Graphs with and without Latent Variables
McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room BH 708
RESUME/ABSTRACT:
Graphical models are graphs with vertices (variables) and edges that encode the conditional independence relations holding among the set of variables of some process. Directed acyclic graphs (DAGs) are commonly used to represent processes in (not exclusively) the biological, econometric, and social sciences. However, there are often many graphs that can encode the same set of conditional independence relations, thus forming a Markov equivalence class. Furthermore, the likelihoods of Markov equivalent graphs are equal. Hence, when performing a model search, it may be more efficient to search across equivalence classes rather than across individual graphs.
In this talk we will review how equivalence classes of DAG models are represented and present an equivalence class search across such graphs. We will then focus on situations where some of the variables in the process are latent, and discuss how to represent Markov equivalence classes in this setting.
Le vendredi 15 février 2008 / Friday, February 15, 2008
15h30 / 3:30 p.m.
Jason D. Nielsen, Carleton University
Adaptive Functional Models for the Analysis of Recurrent Event Panel Data
UdeM, CRM, Pav. André-Aisenstadt, 2920, ch. de la Tour, salle 6214
RESUME/ABSTRACT:
An adaptive semi-parametric model for analyzing longitudinal panel count data is presented. Panel data refers here to data collected as the number of events occurring between specific follow-up times over a period of observation of a subject. The counts are assumed to arise from a mixed non-homogeneous Poisson process where frailties account for heterogeneity common to this type of data. The generating intensity of the counting process is assumed to be a smooth function modeled with penalized splines. A main feature is that the penalization used to control the amount of smoothing, usually assumed to be time homogeneous, is allowed to be time dependent so that the spline can more easily adapt to sharp changes in curvature regimes. The finite sample properties of the proposed estimating functions are investigated and comparisons made with a simpler model assuming a time homogeneous penalty.
Le vendredi 8 février 2008 / Friday, February 8, 2008
15h30 / 3:30 p.m.
Chris Paciorek, Havard School of Public Health
Mapping Ancient Forests: Bayesian Inference for Spatio-temporal Trends in Forest Composition Using the Fossil Pollen Proxy Record
McGill, Burnside Hall, 805 Sherbrooke O., Salle / Room: BH 708
RESUME/ABSTRACT:
Ecologists are interested in understanding changes in tree species abundances and spatial distributions over thousands of years since the last glacial maximum. To estimate forest composition and investigate how much information is available from fossil pollen deposited in lake sediments, we build a Bayesian spatio-temporal hierarchical model that predicts forest composition in southern New England, USA, based on fossilized pollen. The critical relationships between abundances of taxa in the pollen record and abundances in actual vegetation are estimated using modern data and data from colonial records, for which both pollen and direct vegetation data are available. For these time periods, the Bayesian model relates pollen and vegetation data to a latent multivariate spatial process representing forest composition, which allows estimation of several key parameters. For time periods in the past, we use only pollen data and the estimated model parameters to make predictions and assess uncertainty about the latent spatio-temporal process over the last 2000 years. A new graphical assessment of feature significance allows us to infer which spatial patterns are reliably estimated. |
Pour de plus amples informations :
activites@CRM.UMontreal.CA
For further information:
![[Page dŽaccueil du CRM]](/images/Logo_CRM.jpg)
webmestre@CRM.UMontreal.CA
|