top of page

Predicting Poverty Rate Across US Counties

Unmasking Poverty in America: A Data-Driven Exploration of County-Level Determinants

Poverty is a deeply rooted and complex issue that transcends borders and continues to challenge communities across the globe. Despite decades of economic progress and social interventions, millions in the United States still live below the poverty line. While traditional studies have long focused on income or consumption as key indicators, our latest research takes a broader, data-driven approach to understanding poverty. By examining various socioeconomic and demographic factors across U.S. counties, we aim to uncover the deeper roots of poverty and guide more informed policy decisions.


Why Go Beyond Income?

Historically, poverty measurement relied heavily on income thresholds. However, poverty is not solely an economic condition—it’s also a reflection of educational access, employment opportunities, health status, housing quality, and more. Recognizing this, our study takes a comprehensive approach by integrating multiple socio-economic variables, enabling a deeper analysis of what truly drives poverty across regions.


About the Dataset

Our analysis uses a rich dataset encompassing 3,142 counties across the United States. It includes:

  • Demographics: Population data from 2000, 2010, and 2017

  • Economic Indicators: Median household income, per capita income, unemployment rates

  • Housing & Education: Homeownership, multi-unit housing, education levels

  • Public Health: Smoking bans as a proxy for health-oriented policies

  • Geographic Context: Metro vs non-metro classification

This dataset provides a granular look into how different factors interact at the county level to influence poverty.


Multiple Linear Regression: Quantifying Poverty’s Predictors

To understand the magnitude and direction of these influences, we first applied a multiple linear regression model using key variables: unemployment rate, per capita income, median household income, metro status, and education level.


Regression Equation:

log(poverty) = 4.375 + 0.046*unemployment_rate - 0.000015*per_capita_income  - 0.000020*median_hh_income + 0.116*(metro=yes)  - 0.642*(edu_below_HS) - 0.548*(HS_diploma) - 0.564*(some_college)


Key Insights:

  • Unemployment is positively associated with poverty—no surprise there.

  • Education plays a critical role; higher education levels significantly reduce poverty.

  • Per capita income and median household income show a negative correlation with poverty.

  • Interestingly, metro areas are associated with slightly higher poverty rates, possibly due to population density and income inequality.

  • The model achieved an adjusted R² of 72.81%, indicating it explains a substantial portion of poverty variation across counties.


Logistic Regression: Classifying Counties as High or Low Poverty

We also developed a logistic regression model to classify counties based on poverty level (above or below a 12% threshold).

  • Accuracy: ~82.4%

  • Confusion Matrix:

    • True Positives: 193

    • True Negatives: 229

    • False Positives: 31

    • False Negatives: 59

We evaluated performance across different probability thresholds, finding the highest accuracy at a 60% cutoff. The model’s balance between precision and recall makes it a useful tool for policy simulations.


Visual Diagnostics: Are the Assumptions Valid?

  • Residuals vs. Fitted Plot: No discernible patterns—indicating homoscedasticity.

  • Q-Q Plot: Residuals follow a roughly normal distribution.

These diagnostics confirm that the linear model assumptions are reasonably met, adding credibility to our findings.

Conclusion: What Does This Mean for Policy?

Our study reaffirms that poverty is not a one-dimensional problem. Key takeaways:

  • Tackling unemployment is vital but insufficient alone.

  • Investments in education and income support can yield significant benefits.

  • Metropolitan areas may require tailored interventions to deal with unique challenges like housing costs and income disparity.

  • Data-driven approaches are essential to design more effective and equitable policies.


Both regression models provide valuable tools to predict, classify, and ultimately mitigate poverty, guiding targeted interventions at local, state, and national levels.

Final Thoughts

Poverty reduction isn’t just about increasing income—it’s about creating an environment where people have access to quality education, stable jobs, safe housing, and public health protections. By leveraging the power of data, we move one step closer to building resilient, equitable communities.

Whether you're a data scientist, policymaker, or social worker, we hope this research provides a fresh perspective on tackling one of society’s most enduring challenges.


 
 
 

Comments


Feel free to reach out if you have any questions, project inquiries, or want to connect. I'm always open to discussing new projects, creative ideas, or opportunities to be part of your vision.

Connect With Me:

  • GitHub
  • LinkedIn

(906) 767-1906

bottom of page