Monitoring Veterans' health care with chronic diseases requires use of longitudinal data. The VA healthcare system has an extremely rich and high-dimensional longitudinal data on a large number of subjects who become a user at an earlier age and stay with system for the rest of their life. This data provides some unique observational longitudinal cohorts on variety of diseases and their management strategies. For learning from these non-random observational data appropriate methodologies for dimension and confounding bias reduction are required.
The proposed research developed a model for estimating the effect of a grouping variable (intervention) when we are faced with a dynamic situation. Under this setup, our objectives then are: 1. How to reduce the high dimension of these repeated measure covariates, so that no information is lost with respect to the outcome and the grouping variable (intervention)? 2. How to extract and estimate the time-dependent effect of the intervention from the confounding effect of these covariates?
The data set that motivated our methodological study is the observational data set constructed for EPID-004-06F (PI: F. Lederle) entitled: Statins, B-blockers, or ACE Inhibitors for AAA progression. The data set is a collection of repeated measurements of Abdominal Aortic Aneurysms (AAA) maximum diameters for each of the 5362 study patients. The initial (first visit) measurements, the time span between the measurements and the number of measurements per patient are all varying between patients. Some patients have received lipid lowering and simultaneously other medications and treatments.
To achieve dimension reduction and covariate balance, we constructed sufficient summaries. A sufficient summary is a parametric function of the covariates that given any of its values, the covariate distribution is free from intervention level. In order to find a sufficient summary, we should know the density ratios of the covariates given different levels of the intervention. When we have repeated measures these density ratios are extremely difficult to be specified realistically.
To simplify the model without losing its description of reality, we made two assumptions:
1. Response, the intervention and the covariate values at each visit (time point) depend on their corresponding the recent past values (this is called Markov property of order one). 2. The value of each of these three variables at a given visit may only depend on the status of the other variables at that visit.
This pair of assumptions has a Hidden Markov Chain structure. Then, we derived sufficient summaries for the case that the response increments are normal, the possibly confounding covariates have a multivariate normal distribution and the time-dependent intervention is binary. We defined the causal effect of the intervention using linear dependency of the time-varying effect(Kalman Filters) . This is the effect of the intervention had the covariates were independent of the intervention. We showed that this effect can be calculated only using the sufficient summaries. Finally, since this effect depends on the parameters of the model, in order to estimate this effect at all time points, we assume the parameters as a function of time have some Dynamic Bayes Markovian structures. For the general, non-parametric covariate distribution we developed a multi-resolution wavelet approximation of the functional PCA. In order to show the amount of information gained through this procedure we introduced the variance of the likelihood ratios(density ratios) as a measure of information and used the approximation to estimate these ratios.
A patient in the first visit has an elevated cholesterol level(response) and is at some certain age(covariate).The provider prescribes Statin(Treatment). The patient and the provider over years, at different visits, monitor the response, the effect of treatment on the covariates and the effect of covariates in the effectiveness of the treatment and accordingly change the dose, or, totally abandon the treatment. At the current stage of our research, we developed a model to quantify the causal effect of the treatment as a function of time.
We investigated two variations of this model under multivariate normality assumption, and, a non-parametric setup, assuming the number of covariates is large. Under the first-order Markov property a time-varying mediation structure is being developed to capture the direct causal effect of treatment on the outcome and its indirect effect via the covariates. We showed 1) for the multivariate normal case, when the covariance matrix does not depend on the intervention, a three-dimensional summary(three-dimensional propensity type summaries) can be used to achieve covariate balance, and, a low-dimensional regression model (note that number of adjusting information about each patient (p) can be very large) to estimate the effect of grouping variable(in our case lipid lower medication). We found that under the time-varying propensity-like adjustments the estimated effect of Statin on lowering the rate of growth of AAA tends to be larger than the other alternative analysis. 2) we developed a theory for measuring the information in sufficient summaries when in non-parametric case we need to approximate the density ratios. We showed that any basis of the density ratio space is a sufficient summary, using functional principal component analysis(fPCA) we ordered the members of the basis according to the variance of the density ratios and constructed approximate sufficient summaries.
We developed methods that lead to improved, more reliable inference in epidemiological, clinical, and health services research, the proposed study resulted to more soundly understanding the effect of lipid lowering medications(and other medical interventions and health programs) that will possibly impact the growth of AAA and other health issues of the patients.
It is a common knowledge that treatment effects may change over time. We showed that, adjusted for the effect of possible confounders, how the effect of a binary intervention might be estimated. We developed a methodology for approximating sufficient summaries to reduce the dimensionality of our big data problems, while being able to reduce the imbalance of certain types of the covariates in observational studies.
External Links for this Project
Grant Number: I01HX001090-01
- Pourahmadi M, Noorbaloochi S. Multivariate time series analysis of neuroscience data: some challenges and opportunities. Current opinion in neurobiology. 2016 Apr 1; 37:12-15. [view]
- Noorbaloochi S, Meeden G. On Being Bayes and Unbiasedness. Sankhya A: The Indian Journal of Statistics. 2017 May 18; 79:1-16. [view]