Frontiers in Statistics
(Leader: Dr. George Yanev)
Friday, April 20, 2007
SAS Opportunities for Students and Faculty
SAS Student Program Manager
We will talk about the opportunities SAS has for students and faculty. This includes: software, recognition, jobs, and certification — technical or otherwise.
The speaker is also a BASE and Advanced SAS Certified Programmer.
Thursday, April 12, 2007
Subgroup Analysis: A Stylized Bayes Approach
University of Cincinnati
Subgroup analysis is recognized to be important in clinical trials but lacks a formal approach that addresses the main issues such as accounting for multiple testing and limits on the number of tests. We introduce a new approach to inference for subgroups. The main elements of the proposed approach are the use of a priority ordering on covariates to define potential subgroups and the use of the posterior probabilities to identify subgroup effects for reporting. We employ Bayesian model selection methods with objective priors to determine the posterior probabilities of subgroup effects. As usual in Bayesian clinical trial design we compute frequentist operating characteristics (OC). We achieve desired OCs by obtaining a suitable threshold for the posterior probabilities.
Monday, February 26, 2007
Mixture models in genetic research
Stony Brook University
Two different types of mixture models will be presented. My first model is a mixture model with known mixing proportions. The classical regularity conditions for the asymptotic convergence of the null distribution of the likelihood ratio test statistic (LRTS) are not satisfied because of the degeneracy of the Fisher information matrix. The talk covers a brief sketch of the proof that the asymptotic null distribution of the likelihood ratio test (LRT) for two or more components does not depend on the number of components. As an example, Gamma mixtures are applied to an F-2 breeding experiment in classical genetics to detect a major gene.
Second, the test of whether the distribution of genotypes of a single nucleotide polymorphism (SNP) in a control population is the same as the distribution in an affected population can be made out using the \(2\times 3\) test of independence. When the genotyping is determined by an underlying continuous measure that is the measure of three normal components, the LRT of the equality of mixing proportions is an alternative. We compare the performance of these tests by first calculating the power of the LRT and the relative efficiency of the \(2\times 3\) test to the LRT. When the minor SNP allele frequency is less than \(0.2\) in both cases and controls and the separation between genotype components is small, the LRT is more efficient than the \(2\times 3\) test. We present detailed tables of efficiencies and the limiting behavior of the relative efficiency.
Friday, February 23, 2007
A general formulation for a one-sided group sequential design
Barry K. Moser
Duke University Medical Center
This talk focuses on one-sided group sequential designs based on conditional probabilities. A general design formulation is developed. This formulation is then shown to be equivalent to the commonly used one-sided group sequential procedures developed by Pampallona and Tsiatis. The value of the unknown parameter of the conditional probability is shown to control the interpretation of the results of the design. A graphical procedure is proposed to address issues of futility or efficacy when any test statistic is used at an interim stage. An example is used to illustrate the proposed graphical procedure. Finally, the interim boundaries developed from the conditional probabilities also have implication for stochastic curtailment procedures. These implications lead to recommendations on the application of stochastic curtailment stopping rules.
Thursday, February 22, 2007
Screening for Differentially Expressed Genes Using Bayes Factors
University of Connecticut
A common interest in microarray data analysis is to identify genes having different expression levels between two conditions. The existing methods include using two-sampled t-statistics, a modified t-statistics (SAM), semiparametric hierarchial Bayesian models, and nonparametric permutation tests. All of these methods essentially compare two population means. In this talk, we consider using the Bayes factor to compare gene expression levels. The Bayes factor approach is quite attractive and flexible in evaluating the evidence for a gene to be differentially expressed as it allows us to compare not only two population means but also the population distributions. To facilitate the use of the Bayes factor, we propose a new calibration approach that weighs two types of error probabilities differently from the prior predictive distribution of the Bayes factor for each gene and at the same time controls overall error rates for all geners under consideration. Moreover, a novel gene selection algorithm based on the calibration of the Bayes factor is developed and the theoretical properties of the proposed method are carefully examined. Our method is shown to have smaller false discovery rate (FDR) and false non-discovery rate (FNDR) than several existing methods through simulations. Finally, a real dataset from an affymetric microarray experiment to identify genes associated with the onset of osteoblast differentiation is used to further illustrate the proposed methodology.
Tuesday, February 20, 2007
Analyzing and modeling dichotomous traits in large complex pedigrees
University of Chicago
Although it is believed that many common complex disorders have a genetic basis, attempts to unravel the transmission mechanism governing such traits have met with limited success. It has been suggested that isolated founder populations with large, known pedigrees may be advantageous for complex trait mapping. However, their utility has been moderated by the extreme computational intensity involved in the analysis of such pedigrees as a whole.
We are proposing a likelihood method for modeling the transmission of dichotomous traits that can handle large pedigrees in a fast and efficient way. Using generalized linear mixed models, we extend the method of Abney et al. (2002) for mapping quantitative trait loci (QTLs), to accommodate binary traits. The high dimensionality of the integration involved in the likelihood prohibits exact computations. We show that one can overcome this hurdle and obtain the maximum likelihood estimates of the model parameters through the use of an efficient Monte Carlo expectation maximization (MCEM) algorithm.
Analysis of data from a 13-generation pedigree consisting of 1,653 Hutterites, focusing on the diabetes phenotype, reveals evidence for the existence of at least one locus with dominance mode of trait transmission.
Monday, February 19, 2007
Estimating Reaction Constants in Stochastic Intracellular Networks
Greg A. Rempala
University of Louisville
One of the key issues of interest in analyzing stochastic kinetic models of reaction networks involving RNA and DNA molecules (like, e.g., gene transcription) is how to infer the values of the reaction constants. Under mass action kinetics assumption this is relatively straightforward when the system trajectories are fully observed, however, this is rarely the case in practice. The talk shall summarize some recent developments in the area of Bayesian inference for reaction constants using MCMC methodology in “data-poor” settings.
In particular, it shall attempt to indicate the benefits as well as the challenges of this approach with some examples of inferences for well-known biochemical networks models like, e.g., gene transcription and auto-regulation.
Friday, February 9, 2007
Inferential Procedures Based on the Generalized Variable Approach With Applications
University of Louisiana at Lafayette
The generalized \(p\)-value has been introduced by Tsui and Weerahandi (1989, JASA) and the generalized confidence interval by Weerahandi (1993, JASA). The concepts of generalized \(p\)-values and generalized confidence intervals have turned out to be extremely fruitful for obtaining tests and confidence intervals involving non-standard parameters, such as log normal mean and quantiles in one-way random model. In this talk, I will first explain a method of constructing a generalized pivotal quantity for a parameter in a general setup. Then, construction of generalized quantities and inferential procedures based on them will be outlined for normal parameters, lognormal mean and to compare two lognormal means. I will briefly explain the applications of the generalized variable (GV) approach for setting tolerance limits in one-way random model, for correlation analysis in a multivariate normal distribution and finding one-sided limits for stress-strength reliability involving two-parameter exponential distributions. I will also compare the results based on the GV approach with those of the other methods, and illustrate the results with practical examples.
Friday, February 2, 2007
Exceedance Problems for a Family of Branching Processes
School of Mathematical Sciences
Universiti Sains Malaysia
A problem of the first exceedance of given level by the family of independent branching processes is considered. Limit theorems for the index of the first process exceeding some fixed and increasing level in subcritical, critical and supercritical cases when the processes have common and different offspring distributions will be presented.
Friday, January 26, 2007
Parameter Estimatiion of Record Breaking Data and Some Characterization Problems
We shall present the general problem of classical parametric inference from record breaking data which was first addressed by Samaniego and Whitaker (1986, 1988). Hoinkes and Padgett (1994) extended the work of Samaniego and Whitaker (1986, 1988) to the Weibull distribution. I will present something along this line on Gumbel distribution and present a comparison between the Weibull and Gumbel distributions.
I will also talk about some distributional properties of lower generalized order statistics (LGOS). Based on the distributional properties of LGOS some characterizations of the power function distribution will be given.
Friday, January 19, 2007
Johnson System and Mixture Modeling for Gene Expression Data Analysis
A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. In recent years several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. In this talk Johnson system of curves will be introduced. We will discuss how Johnson system can be used for gene expression data analysis. A mixture model approach for gene expression data will also be discussed.