Monday, February 28, 2005
Semiparametric Analysis of Longitudinal Data with Informative Observation Times
University of Missouri
Statistical analysis of longitudinal data is an important topic faced in a number of applied fields including epidemiology, public health and medicine. In general, the information contained in longitudinal data can be divided into two parts. One is the set of observation times that can be regarded as realizations of an observation process and the other is the set of actually observed values of the response variable of interest that can be seen as realizations of a longitudinal or response process. For their analysis, a number of methods have been proposed and most of them assume that the two processes are independent. This greatly simplifies the analysis since one can rely on conditional inference procedures given the observation times.
However, the assumption may not be true in some applications. We will consider situations where the assumption does not hold and propose a semiparametric regression model that allows the dependence between the observation and response processes. Inference procedures are proposed based on the estimating equation approach and the asymptotic properties of the method are established. The results of simulation studies will be reported and the method is applied to a bladder cancer study.
Friday, February 25, 2005
Robust Estimation of Mixture Complexity
University of Georgia, Athens
Developing statistical procedures to determine the number of components, known as mixture complexity, remains an area of intense research. In many applications, it is important to find the mixture with fewest components that provides a satisfactory fit to the data. This talk focuses on consistent estimation of unknown number of components in finite mixture models, when the exact form of the component densities are unknown but are postulated to be close to members of some parametric family. Minimum Hellinger distances are used to develop a robust estimator of mixture complexity, when all the parameters associated with the model are unknown. The estimator is shown to be consistent. When there is no model misspecification, Monte Carlo simulations for a wide variety of target mixtures illustrate the implementation and performance of the estimator. Robustness of the estimator examined via model misspecification shows that, in contrast to an estimator based on Kullback-Leibler distance, the performance is unaffected by model misspecification. An example concerning hypertension is revisited to further illustrate the performance of the estimator.
Wednesday, February 23, 2005
General Convex Stochastic Orderings and Related Martingale-type Structures
University of South Carolina
Over-dispersion of a population relative to a fitted baseline model can be accounted for in various ways. For example, one way is by using a mixture over the family of baseline models. Another is via a martingale structure if the Total Time on Test (TTT) Transform of the population “dominates” that of the baseline model. Here these latter ideas are extended to stochastic orderings in terms of Tchebycheff systems and related to a martingale-type of structure, called a \(k\)-mart, between the population and the baseline model. These ideas are illustrated for a binomial baseline model using the Saxony 1876-85 sibship census for families with twelve siblings. In addition the construction of a “most identical” distribution in the case of \(1\)-mart is presented.
Friday, February 4, 2005
Analysis of Gene Expression Data and Chemosensitivity Prediction
Microarrays are part of a new class of biotechnologies which allow the monitoring of expression levels for thousands of genes simultaneously. Microarray technology can provide important insights about the underlying genetic causes of many important biological questions. We discuss the computational methods of four important tasks: (1) The identification of differentially expressed genes, (2) the discovery of clusters of differentially expressed genes, (3) identification of features from the clusters and (4) the classification of biological samples.
The study is on gene expression levels of 55 advanced stage ovarian cancer patients. 33 of these patients showed complete response to chemotherapy, while the rest had a progressive disease at the completion of therapy. We sought to determine whether the gene expression levels were sufficient for the prediction of chemosensitivity.
Friday, January 21, 2005
Review of Extreme Value Distributions with Examples
Extreme value theory has turned out to be one of the most important statistical disciplines in the last few decades. One of the most outstanding features of extreme value analysis is the objective to quantify the stochastic behavior of a process at unusually large (or small) levels. The central platform of extreme value theory is the three types of theorem of Fisher and Tippet, which asserts that there are only three types of distributions that can arise as limiting distributions of extreme values in the random samples.
The topic of this seminar mainly focuses on the review of extreme value distributions, especially Generalized Extreme Value (GEV) and Generalized Pareto Distributions (GPD) with examples as applied to the existing real data on rainfall and sea-levels.
Possible applications of extreme value theory to the area of pharmacokinetics to model the maximum drug concentrations in blood after the infusion of a drug along with appropriate covariates will be discussed.
Friday, January 14, 2005
How to Perform an Analysis of Variance Procedure in S-Plus
The commands in S-Plus that generate the one-way ANOVA, the factorial ANOVA, and the nonparametric ANOVA will be demonstrated using datasets from several disciplines. Illustrations of how to test the underlying assumptions will be presented. Several of the post-hoc procedures will also be reviewed. A general discussion of the rationale for and the interpretation of the Analysis of Variance procedures and their related tests will also be conducted.