TY - JOUR
T1 - Reducing the complexity of high-dimensional environmental data
T2 - An analytical framework using LASSO with considerations of confounding for statistical inference
AU - Frndak, Seth
AU - Yu, Guan
AU - Oulhote, Youssef
AU - Queirolo, Elena I.
AU - Barg, Gabriel
AU - Vahter, Marie
AU - Mañay, Nelly
AU - Peregalli, Fabiana
AU - Olson, James R.
AU - Ahmed, Zia
AU - Kordas, Katarzyna
N1 - Publisher Copyright:
© 2023 Elsevier GmbH
PY - 2023/4
Y1 - 2023/4
N2 - Purpose: Frameworks for selecting exposures in high-dimensional environmental datasets, while considering confounding, are lacking. We present a two-step approach for exposure selection with subsequent confounder adjustment for statistical inference. Methods: We measured cognitive ability in 338 children using the Woodcock-Muñoz General Intellectual Ability (GIA) score, and potential associated features across several environmental domains. Initially, 111 variables theoretically associated with GIA score were introduced into a Least Absolute Shrinkage and Selection Operator (LASSO) in a 50% feature selection subsample. Effect estimates for selected features were subsequently modeled in linear regressions in a 50% inference (hold out) subsample, first adjusting for sex and age and later for covariates selected via directed acyclic graphs (DAGs). All models were adjusted for clustering by school. Results: Of the 15 LASSO selected variables, eleven were not associated with GIA score following our inference modeling approach. Four variables were associated with GIA scores, including: serum ferritin adjusted for inflammation (inversely), mother's IQ (positively), father's education (positively), and hours per day the child works on homework (positively). Serum ferritin was not in the expected direction. Conclusions: Our two-step approach moves high-dimensional feature selection a step further by incorporating DAG-based confounder adjustment for statistical inference.
AB - Purpose: Frameworks for selecting exposures in high-dimensional environmental datasets, while considering confounding, are lacking. We present a two-step approach for exposure selection with subsequent confounder adjustment for statistical inference. Methods: We measured cognitive ability in 338 children using the Woodcock-Muñoz General Intellectual Ability (GIA) score, and potential associated features across several environmental domains. Initially, 111 variables theoretically associated with GIA score were introduced into a Least Absolute Shrinkage and Selection Operator (LASSO) in a 50% feature selection subsample. Effect estimates for selected features were subsequently modeled in linear regressions in a 50% inference (hold out) subsample, first adjusting for sex and age and later for covariates selected via directed acyclic graphs (DAGs). All models were adjusted for clustering by school. Results: Of the 15 LASSO selected variables, eleven were not associated with GIA score following our inference modeling approach. Four variables were associated with GIA scores, including: serum ferritin adjusted for inflammation (inversely), mother's IQ (positively), father's education (positively), and hours per day the child works on homework (positively). Serum ferritin was not in the expected direction. Conclusions: Our two-step approach moves high-dimensional feature selection a step further by incorporating DAG-based confounder adjustment for statistical inference.
KW - Child health
KW - Environmental epidemiology
KW - High-dimensional data
KW - LASSO
KW - Machine learning
KW - Statistical inference
UR - http://www.scopus.com/inward/record.url?scp=85148745143&partnerID=8YFLogxK
U2 - 10.1016/j.ijheh.2023.114116
DO - 10.1016/j.ijheh.2023.114116
M3 - Artículo
C2 - 36805184
AN - SCOPUS:85148745143
SN - 1438-4639
VL - 249
JO - International Journal of Hygiene and Environmental Health
JF - International Journal of Hygiene and Environmental Health
M1 - 114116
ER -