Code Monkey home page Code Monkey logo

idblr / geo_us_lung_cancer_and_smoking Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.41 MB

Companion for the 2023 manuscript in Cancer Epidemiology, Biomarkers & Prevention entitled "Geographic Patterns in U.S. Lung Cancer Mortality and Cigarette Smoking"

License: Apache License 2.0

R 100.00%
geospatial-analysis lung-cancer r rstats county-level mortality-rates cigarette-smoking cigarette-smoking-prevalence bivariate-analysis geospatial-clustering

geo_us_lung_cancer_and_smoking's Introduction

Geographic Patterns in U.S. Lung Cancer Mortality and Cigarette Smoking

License GitHub last commit

Date repository last updated: June 10, 2023

Authors

  • Alaina H. Shreves1,2 - ORCID
  • Ian D. Buller3,4 - ORCID
  • Elizabeth Chase5,6 - ORCID
  • Hannah Creutzfeldt3,7 - ORCID
  • Jared A. Fisher3 - ORCID
  • Barry I. Graubard6 - ORCID
  • Robert N. Hoover8 - ORCID
  • Debra T. Silverman3 - ORCID
  • Susan S. Devesa5 - Co-Senior Author - ORCID
  • Rena R. Jones3 - Co-Senior Author & Corresponding Author - ORCID
  1. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, 02115, USA
  2. Trans-Divisional Research Program, Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NCI), National Institutes of Health (NIH), Rockville, MD, 20850, USA
  3. Occupational and Environmental Epidemiology Branch, DCEG, NCI, Rockville, MD, 20850, USA
  4. Cancer Prevention Fellowship Program, Division of Cancer Prevention, NCI, Rockville, MD, 20850, USA
  5. Infections and Immunology Branch, DCEG, NCI, NIH, Rockville, MD, 20850, USA
  6. Department of Biostatistics, University of Michigan School of Public Health, University of Michigan, Rockville, MD, 20850, USA
  7. Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, 90095, USA
  8. Office of the Director, DCEG, NCI, NIH, Rockville, MD, 20850, USA

Project Details

Lung cancer is the leading cause of cancer death in the United States (US) and variations in lung cancer mortality and smoking behavior are evident by sex and region. We apply geospatial statistical methods to describe patterns in lung cancer mortality rates (2005-2018) in relation to patterns in cigarette smoking prevalences (1997-2003) by sex at the US county level. Our findings identify counties where lung carcinogens other than smoking may be driving lung cancer mortality and where further study is needed.

Project Timeframe

Time Event

1997-2003

NCI Model-based Small Area Estimates of Cancer-Related Measures smoking prevalences for persons aged 18+ years (see data availability section below)

2005-2018

Lung and bronchus cancer mortality rates among persons aged 20+ years from the National Vital Statistics System data from the National Center for Health Statistics (see data availability section below)

July 2020

Project Initiation

March 2022

Initial manuscript submission to Cancer Epidemiology, Biomarkers & Prevention for peer-review

November 2022

Manuscript accepted by Cancer Epidemiology, Biomarkers & Prevention

February 2023

Manuscript published in Cancer Epidemiology, Biomarkers & Prevention

June 2023

Update to the False Discovery Rate (Benjamini & Hochberg, 1995) calculation for multiple testing correction that now orders the p-values in ascending order instead of in descending order.

R Scripts Included In This Repository

This repository includes R scripts used to calculate the Lee's L statistic and render the geographic visualizations found in the following peer-reviewed manuscript:

Shreves AH, Buller ID, Chase E, Creutzfeld H, Fisher JA, Graubard BI, Hoover RN, Silverman DT, Devesa SS, Jones RR. (2023) Geographic Patterns in U.S. Lung Cancer Mortality and Cigarette Smoking. Cancer Epidemiology, Biomarkers & Prevention, 32(2):193-201. DOI:10.1158/1055-9965.EPI-22-0253 PMID:36413442.

R Script Description

functions.R

Custom functions to calculate the local Lee's L statistic with correction for multiple testing

preparation.R

Calculate the local Lee's L statistics for the four comparisons. Requires a data set to run (not included; see notes within).

figure1.R

Generate Figure 1

figure2.R

Generate Figure 2

supplemental1.R

Generate Supplemental Figure 1

supplemental2.R

Generate Supplemental Figure 2

The repository also includes the code to create the project hexagon sticker.

Getting Started

  • Step 1: You must download the data (see Data Availability section)
  • Step 2: Save the data set to the data directory in this repository. Currently specified as a CSV file, but modify the path on Line 58 of the preparation.R file based on data location and file name
  • Step 3: Run R scripts for figures. The preparation.R file will source the functions.R file.

Data Availability

County-level U.S. lung cancer mortality rates and smoking prevalences are downloadable from Model-based Small Area Estimates of Cancer-Related Measures from the Surveillance Research Program within the Division of Cancer Control and Population Sciences of the National Cancer Institute and the National Vital Statistics System from the National Center for Health Statistics of the Centers for Disease Control and Prevention.

Questions?

For questions about the manuscript please e-mail the corresponding author Dr. Rena R. Jones.

geo_us_lung_cancer_and_smoking's People

Contributors

idblr avatar

Stargazers

 avatar

Watchers

 avatar

geo_us_lung_cancer_and_smoking's Issues

Thrown Error

W <- as(lw, "symmetricMatrix")

Hi Ian! I was surfing through your scripts and ran into a possible error. But I thought I bring it to your attention in case if there's a bug or a missing intermediate step within this line. I've provided the error warning below.

Amazing work here!

Error in as(lw, "symmetricMatrix") :
no method or default for coercing “listw” to “symmetricMatrix”

Potential bug?

LeeCSF <- LeeL(dat = alcove_proj, x = "CSF_Pct_1997_2003", y = "Female_Rate", label = "LCSF", numsim = 100000)

Hi Ian,

I think I may have come across a potential bug with the LeeL function. I'm getting an error that may relates some discordance between the tmp data frame and the actual data set........ “replacement has X rows, data has Y” issue. Not sure it's from my end.

Error in [[<-.data.frame(*tmp*, name, value = logical(0)) :
replacement has 0 rows, data has 3108

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.