Center for Complex Data Systems
The Center promotes and facilitates collaborations between researchers working in natural sciences, cognitive/social sciences, data-intensive applications, advanced computing and visualization, mathematical modeling and implementation of optimal algorithmic and numerical methods. It integrates with the advanced resources provided by the Research Computing group, implementing state-of-the-art,
scalable numerical procedures developed within academic and industry-related research, and creating a dynamic “software repository” that facilitates interdisciplinary collaborations and portability of solutions between distinct scientific areas. One of the Center priorities is to attract and train graduate students to prepare them for the new challenges of the modern workforce, by combining fundamental research with industry-oriented project development.
The goal of the center is to provide an integrated, interdisciplinary, synergistic response to the computational challenges in quantitative sciences arising from the analysis of data with high
complexity (due either to its data structure, relational characteristics, or to sheer dimensionality/dynamics of the data collection).
Analysis of data with high complexity is at the core of some of the main challenges of quantitative sciences today: physical sciences studies related to multi-scale complex systems (advanced material design, stability of power grids, or atmospheric and ocean dynamics); clinical studies in medicine, molecular and quantitative biology, and related areas; large-scale simulations of social systems and user-driven interactions (from economic trade to social networks).
The Center's objective is to establish a collaborative structure to address research topics from these areas, arising in particular projects from SNSM and other CAS departments.
Beside serving as a hub of mathematical, computational, and software resources (covering all aspects from optimal exact algorithms, to heuristic and emerging approximate methods), the Center facilitates the development of collaborative, inter-departmental research proposals, to better position such efforts relative to federal funding agencies and other STEM-related opportunities.
Core and Affiliated Faculty
- Barbos, Andrei (Economics)
- Connor, Charles (School of Geosciences)
- Fawcett, Timothy (Research Computing)
- Khavinson, Dima (Mathematics & Statistics)
- Lee, Seung-Yeop (Mathematics & Statistics)
- Majchrzak, Dan (Research Computing)
- Rogers, David (Chemistry)
- Skrzypek, Lesław (Mathematics & Statistics)
- Teodorescu, Iuliana (Mathematics & Statistics)
- Woods, Lilia (Physics)
- You, Yuncheng (Mathematics & Statistics)
Distinguished Scholar Lecture Series
The Department of Mathematics and Statistics' Center for Complex Data Systems has established a new Distinguished Scholar Lecture Series, starting Spring 2016, with support from the College of Arts and Sciences. The invited speakers will deliver a public lecture targeted at a wide audience, hold round-table discussions, and will be available to meet and explore individual research and institutional-level collaborations.
The following have been confirmed as speakers for this semester:
Research Projects (2014–2015)
Environmental Hazard Mitigation by Dynamical Coupled Networks Modeling
Environmental hazard situations such as the impact of coastal flooding and storm surges on large urban communities and transportation infrastructure are difficult to address by “static” models and emergency planning. This is due to the inherent complexity of the model (total number of parameters), the importance of random effects which cannot be accurately incorporated into a static model, and the nonlinear nature of the problem (featuring feed-back and time delayed responses). By carefully combining specific geophysical predictive models with demographic, social and economic structures into a dynamical random network model, capable of supporting real-time data updating and optimal decision-making, we created a computational structure which is provably superior to the static contingency planning approach. By employing advanced algorithms and optimization of software implementation, this project addresses the need for portable, scalable computational structures which can be deployed in many different hazard environments. This interdisciplinary effort is the subject of a targeted National Science Foundation call for proposals.
Compressed Sensing and Dimensionality Reduction by Probabilistic Algorithms
Dimensional reduction for efficient representation of large data structures as finite subsets in high-dimensional vector spaces is an important open problem for both pure and applied mathematical sciences. The existing theoretical results point towards efficient algorithms (in the sense of complexity classes) for data compression, which are however reliant on probabilistic implementations, and whose performance is highly dependent on the norm structure selected for the embedding vector space. A specific instance of this class of problems arises in categorical data analysis, and lattice-vector approximation. Using a refined computational implementation of the theoretical algorithms, this study quantifies the speed-versus-accuracy trade-off and provides concrete codes for this type of applications. The planned timeline of the project includes a multidisciplinary proposal submitted to the National Health Institutes.
Optimization of Numerical Simulations in 3-D Compressible Euler Equation
The most frequently-used computational recipes for 3-D compressible Euler equations (arising, e.g. in geophysical models for atmospheric, oceanic or volcanic flows) are either time-efficient but too simplified to capture relevant features of the physical system, or rigorous but too inefficient for practical deployment in real-life situations. An optimization approach based on adaptive-grid successive approximations of the compressible Euler equation via incompressible short-range local dynamics is a prime candidate for a good compromise between precision and efficiency. This is due to the invariance of incompressible equations under a group of spatial transformations, which allows for very fast numerical solvers on the short-range approximations. The method and its specific implementation are at the core of an international, multi-institutional grant application with the National Science Foundation.
Improving the Convergence of Maximum Likelihood Estimates
While the maximum likelihood estimate for parametric families of distributions is known to be a very good estimator in a theoretical sense, the actual rate of convergence for specific cases (with many applications of immediate interest) may be too slow for the current computational capabilities. This project addresses the problem by comparing and quantifying several algorithmic and numerical approaches, in the sense of computational costs, ease of implementation, and scalability. The direct applications of the study include material science atomistic models, computational and decision grids, and information-theoretic graphical models. The research developed under this approach is the subject of a grant proposal for the NSF's Division of Mathematical Sciences.
External Funding applied for (2014–2015)
- BIGDATA: F: DKA: CSD: Complexity-reducing algorithms for efficient simulations of large-scale physical systems
- BDD: Collaborative Research: Real-Time Modeling of Volcanic Plumes
- Collaborative Research EDT: Preparing Mathematics Graduate Students for Interdisciplinary Careers in Biomedical Sciences
- Adaptive Filtering for Optimal Digital Encoding of Acoustic Waves
- Optimization in nonparametric survival analysis and large deviations theory
- Hazard SEES: Improving Hazard Prediction, Preparedness, and Response through Dynamic Network Modeling