Predicting Eczema Flare-Ups: A Machine Learning Approach

Anousha Mukherjee

5/10/20248 min read

Abstract

Eczema (atopic dermatitis) and related atopic conditions—including hay fever, rhinitis, and asthma—have surged globally in recent decades. According to the 2021 National Health Interview Survey, 31.8% of U.S. adults suffer from allergies or eczema, with 31.6 million individuals (10.1%) diagnosed with eczema specifically.

The impact extends far beyond physical discomfort. Patients experience persistent itching, inflammation, and sleep disturbances that diminish quality of life, reduce productivity, and contribute to mental health challenges. Children face bullying and adults experience workplace discrimination. Economically, eczema costs the U.S. healthcare system approximately $5.3 billion annually and results in 5.9 million lost workdays.

Despite widespread anecdotal evidence linking environmental factors to eczema flare-ups, systematic research examining this connection remains limited. This study addresses this gap by developing a predictive machine learning model to identify key factors, including environmental variables, that influence eczema likelihood. The goal is to provide proactive, personalized support enabling patients to anticipate flare-ups and plan their activities and skincare routines.

This research integrates health data from the 2021 National Health Interview Survey (29,482 adults and 8,261 children across 30,673 households) with environmental metrics from NOAA's Climate Data Online database. A logistic regression model was developed to predict eczema status, with ROSE oversampling applied to address data imbalance (11.7% eczema prevalence). The model incorporated socio-demographic, health history, and environmental predictors.

After controlling for demographic and health factors, environmental variables—specifically temperature, humidity, and their interaction—emerged as statistically significant predictors of eczema status. The final model achieved an AUC of 0.78, demonstrating good performance and providing empirical validation of the weather-eczema connection. This predictive capability can provide personalized flare-up risk assessments, empowering patients to proactively manage their condition by anticipating high-risk periods and adjusting their routines—transforming eczema management from reactive to proactive care.

1. Introduction and Problem Statement

Eczema is a common chronic inflammatory skin condition that significantly impacts quality of life for millions of individuals worldwide. The most prevalent form, Atopic Dermatitis, is characterized by persistent symptoms including itchy, red, inflamed, and dry skin that can appear anywhere on the body. The condition follows a relapsing-remitting pattern, with periods of relative calm interrupted by sudden flare-ups of varying intensity.

The prevalence of eczema represents a substantial public health concern. In the United States alone, approximately 31.6 million Americans—roughly 10% of the population—live with some form of eczema. This includes about 16.5 million adults and 9.6 million children diagnosed with Atopic Dermatitis. The global burden is similarly significant, with eczema affecting approximately 10% to 20% of children and 2% to 10% of adults worldwide. These figures underscore the widespread nature of the condition and the critical need for effective management strategies.

Beyond the physical discomfort, eczema imposes considerable economic and psychosocial burdens. Patients often experience sleep disturbances due to nighttime itching, reduced productivity, social stigma, and emotional distress. The chronic nature of the condition requires ongoing medical care, specialized skincare products, and lifestyle modifications, contributing to substantial healthcare costs and diminished quality of life.

A major daily challenge for eczema patients is managing the unpredictable nature of flare-ups. Unlike conditions with consistent symptoms, eczema can suddenly worsen without obvious warning, making it difficult for patients to plan and engage in normal daily activities. Patients must constantly anticipate potential flare-ups when scheduling physical exercise, social events, or travel. They must also maintain rigorous skincare routines—including frequent moisturizing, avoidance of known triggers, and strategic use of medications—often without certainty about when or why their symptoms might intensify. This unpredictability creates a cycle of anxiety and reactive management, where patients respond to symptoms after they appear rather than preventing them proactively. Current clinical approaches primarily focus on general trigger avoidance and symptom management, but lack personalized, predictive tools that could empower patients to anticipate and prevent flare-ups based on their individual risk profiles.

The primary objective of this study is to develop a machine learning model that identifies key factors associated with moderate to severe eczema flare-ups, proxied by the current presence of eczema symptoms and active itchy rash. This model tests the hypothesis that environmental factors—particularly temperature, humidity, and rainfall patterns—interact with individual socio-demographic and health characteristics to influence eczema activity.

2. Methodology

2.1 Data Sources

This study utilizes a combined dataset from two primary sources:

1. National Health Interview Survey (NHIS) 2021

The NHIS is an annual, nationally representative survey conducted by the CDC, providing comprehensive socio-economic, demographic, and health-related data on the U.S. population. Eczema-related questions are included every three years, with 2021 being the most recent available dataset containing this information.

Dependent Variable Construction: The binary eczema status variable (1=yes, 0=no) was derived from two key survey questions:

  • DXSKIN: "Have you ever been told by a doctor or other health professional that you had eczema or atopic dermatitis?"

  • CURSKIN: "Do you get an itchy rash due to eczema or atopic dermatitis?"

Sample Characteristics:

  • Adult survey: 29,482 observations from a pool of 622 total variables, with 56 relevant variables initially shortlisted for analysis spanning socio-economic, demographic, and health history factors

  • Child survey: 8,261 observations from 349 total variables, with 49 relevant variables initially shortlisted

  • Privacy constraint: Due to NHIS confidentiality requirements, participant location data was limited to broad regional classifications rather than specific geographic identifiers

2. NOAA Climate Data Online (CDO)

Local environmental data from NOAA's CDO database provided six weather metrics known to influence eczema flare-ups:

  • Daily maximum temperature (°F)

  • Daily minimum temperature (°F)

  • Average daytime temperature (°F)

  • Average rainfall (inches)

  • Average humidity (%)

  • Maximum humidity (%)

Integration Methodology: Environmental metrics were calculated as population-weighted monthly averages for the 5-6 most populous cities within each U.S. region. These aggregated values were then merged with the NHIS dataset based on the respondent's region and the month of survey completion.

Combined Dataset

The final integrated dataset contained 43 common variables across both adult and child surveys, supplemented with the six weather variables from NOAA CDO matched by region and survey month. This structure enabled a comprehensive multivariate analysis incorporating individual-level health and demographic factors alongside environmental conditions.

2.2 Machine Learning Model

A Logistic Regression model was selected to predict the binary outcome of eczema status (1=yes, 0=no). This method uses a logistic function to estimate the relationship between predictor variables and the probability of eczema occurrence, yielding interpretable coefficients that facilitate meaningful analysis.

The raw dataset was imbalanced, with only 11.7% of individuals having eczema. To address this imbalance and improve model performance, the ROSE (Random Over-Sampling Examples) oversampling method in R was applied to create a balanced training sample.

The final set of independent variables included a combination of socio-demographic/health factors and environmental data:

3. Results

3.1 Descriptive Statistics

Eczema prevalence was observed to be higher in children (14%) compared to adults (11%) and in females (13%) compared to males (10%). Prevalence was similar between urban (11.7%) and rural (11.5%) residents. The data also showed that children with eczema faced a higher prevalence of bullying (11%) compared to children without eczema (9%).

I also tested the following four hypotheses using t-tests to determine connection between eczema prevalence and behavioral or socio-economic factors.

HYPOTHESIS 1: ADULT

H0: There is no difference in eczema prevalence b/w adults and children

HA: There is significant difference in eczema prevalence b/w adults & children

HYPOTHESIS 2: GENDER

H0: There is no difference in eczema prevalence b/w male and female

HA: There is significant difference in eczema prevalence b/w male & female

HYPOTHESIS 3: BULLYING

H0: There is no difference in in eczema prevalence b/w rural and urban residents

HA: There is significant difference in in eczema prevalence b/w rural and urban residents

HYPOTHESIS 4: BULLYING

H0: There is no difference in bullying for children w/ or w/out eczema

HA: Children with eczema face significantly more bullying

3.2 Logistic Regression Model Performance

The logistic regression model achieved an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.774. AUC quantifies the model's ability to correctly classify data, with a value closer to 1 indicating better performance.

3.3 Model Coefficients

The final model coefficients demonstrate that several factors, including environmental variables, have a statistically significant impact on predicting eczema. The coefficients highlight several key findings:

  • The strongest predictor of having eczema is having Non-Eczema Atopic conditions (allergies/asthma/hay fever), which is consistent with the "atopic triad".

  • Being a Child is a significant positive predictor, reflecting the higher prevalence in children.

  • Anxiety/Nervousness (AnxNervSad) is also a strong positive predictor, which aligns with the known mental health impact of eczema.

  • The coefficients for Humidity$^2$, Temp_Max, and their interaction term (Humidity$^2$:Temp_Max) are all statistically significant (p-value < 0.01 or 0.05), supporting the hypothesis that environmental variables influence eczema flares.

  • Average Rainfall is a statistically significant negative predictor.

4. Conclusion and Future Work

The machine learning model successfully demonstrated that a combination of sociodemographic, health history, and environmental factors can predict the probability of an individual having eczema. The statistical significance of temperature, humidity, and their interaction confirms that environmental variables are important predictors of eczema status. While the current model shows promising performance (AUC 0.774), there is significant potential for improvement through the addition of more feature variables and further finetuning. To translate these findings into practical applications, I developed a beta app and website called "Shine" that integrates the ML model with real-time weather API data and userinputted symptom severity levels. This combination enables personalized predictions of flare-up probability, empowering users to proactively track their eczema symptoms, identify potential triggers, and make informed decisions about their skincare, daily activities, and lifestyle management.

5. References

  1. Eczema Prevalence, Quality of Life and Economic Impact (https://nationaleczema.org/eczema-facts/)

  2. Laughter MR, Maymone MBC, Mashayekhi S, et al. The global burden of atopic dermatitis: lessons from the Global Burden of Disease Study 1990–2017*. British Journal of Dermatology. 2021;184(2):304-309. doi:10.1111/bjd.19580

  3. Chovatiya, R., Begolka, W. S., Thibau, I. J. & Silverberg, J. I. Impact and Associations of Atopic Dermatitis Out-of-Pocket Health Care Expenses in the United States. Dermatitis (2021) doi:10.1097/DER.0000000000000795.

  4. Guillem Hurault, Jean François Stalder, Sophie Mery, Alain Delarue, Markéta Saint Aroman, Gwendal Josse, Reiko J. Tanaka, EczemaPred: A computational framework for personalised prediction of eczema severity dynamics, Clinical and Translational Allergy, 10.1002/clt2.12140, 12, 3, (2022).

  5. Silverberg, J. I. Health Care Utilization, Patient Costs, and Access to Care in US Adults With Eczema: A Population-Based Study. JAMA Dermatol. 151, 743–752 (2015).

  6. Ariane Duverdier, Adnan Custovic, Reiko J. Tanaka, Data-driven research on eczema: Systematic characterization of the field and recommendations for the future, Clinical and Translational Allergy, 10.1002/clt2.12170, 12, 6, (2022).

  7. Guillem Hurault, Valentin Delorieux, Young-Min Kim, Kangmo Ahn, Hywel C. Williams, Reiko J. Tanaka, Impact of environmental factors in predicting daily severity scores of atopic dermatitis, Clinical and Translational Allergy, 10.1002/clt2.12019, 11, 2, (2021).