Comparing brand vs. generic drugs with insurance claims data
I am working on developing a method to use insurance claims data to determine the causal effect of using a generic drug instead of its corresponding brand version.
Non-monotone missing data
I am also working on developing methods to estimate effects in the presence of non-monotone missing data. Non-monotone patterns of missingness can occur in longitudinal studies in which patients can miss a measurement without dropping out of the entire study (i.e., they miss their measurement on check-up 5 of the study, but show up to check-up 6.) This is a very common scenario in longitudinal studies.
Current approaches to missing data in these settings treat the patient as having fully dropped out (never to return) as soon as they miss their first measurement. This is inefficient, as it wastes all of the data measured on that patient later in the study.
I am working on deriving the most efficient estimator in a semiparametric model that has non-monotone patterns of missingness. This will provide the most efficient use of all of the data in a setting where we make very few assumptions.
When can you condition on a post-baseline variable?
Adjusting for, or conditioning on, post-basline variables can induce selection bias. For example, if I condition on cancer patients who respond to treatment, then I might find that the treatment is extremely effective at eliminating cancer. However, this is a biased conclusion, because I have simply chosen patients who were responding to treatment. Of course treatment was effective in those patients!
However, there may be times when it is okay to condition on post-baseline variables--or even necessary to perform an unbiased analysis. The principle to never condition on a post-baseline variable is just a "rule of thumb" motivated by the fact that doing so puts you at a high risk of inducing selection bias. However, the real reason it does this has to do with the causal structure of the specific problem, which can be encapsulated in a Directed Acyclic Graph (DAG). I am interested in teasing out the specific conditions under which is it justified to condition on a post-baseline variable vs. when it is not. This will hopefully free up researchers to develop methods that leverege post-baseline variables.