Predict RSV in USA using CDC data
Arun & Zhongyi
Respiratory Syncytial Virus (RSV) was discovered in the year 1956 and has been recognized as one of the most common causes of childhood illness.
RSV symptoms usually look like a common cold, but it can be serious leading to bronchiolitis (inflammation of the small airways in the lung) and pneumonia, especially for infants and older adults.
According to CDC(Center for Disease Control)RSV results around 58,000 hospitalizations annually and 100 to 300 deaths among children under 5.
With mask-wearing and physical distancing for COVID-19, there were fewer cases of RSV in 2020.
RSV cases began to increase in spring 2021 when safety measures relaxed with the arrival of COVID-19 vaccines.
This year, RSV in multiple U.S. regions are nearing seasonal peak levels.
Respiratory syncytial virus (RSV) infection trend has gained many researchers’ concerns globally. Researchers are using different modeling approaches to predict the RSV trend.
Thongpan, Ilada: applied multivariate time-series analysis to show the possible prediction of RSV activity based on the climate in Thailand.
Manuel, Britta: applied logistic regression to develop a prediction model and developed a web-based application to predict the individual probability of RSV infection.
Reis, Julia: tried to built a real-time RSV prediction system using a susceptible-infectious-recovered (SIR) model in conjunction with an ensemble adjustment Kalman filter (EAKF) and 10 years CDC data[6]
Corberán-Vallet: presented Bayesian stochastic susceptible‐infected‐recovered‐susceptible (SIRS) model to understand RSV dynamics in the region of Valencia, Spain.
Leecaster, Molly: used simple linear regression to explore the relationship between three epidemic characteristics (final epidemic size, days to peak, and epidemic length).
Data set for this research is from RSV Hospitalization Surveillance Network (RSV-NET) (one of CDC research and surveillance platforms).
RSV-NET has been collecting RSV-associated hospitalizations in adults and children since 2018-2019 season from 58 counties in 12 states, including California, Colorado, Connecticut, Georgia, Maryland, Michigan, Minnesota, New Mexico, New York, Oregon, Tennessee, and Utah.
They conduct population-based surveillance system for laboratory-confirmed COVID-19, RSV, and influenza-associated hospitalizations in the US among children younger than 18 years of age and adults.
A case is defined by laboratory-confirmed RSV in a person who lives in a defined RSV-NET surveillance area and Tests positive for RSV withn 14 days before or during hospitalization.
Time frame: In season 2018-2019, 2019-2020, data collected is from October 1 to April 30. In season 2020-2021, 2021-2022, 2022-2023, data collected is from October 1 to October 1 next year.
Simple Linear Regression algorithm only works when the relationship between the data is linear, suppose if we have non-linear data then linear regression will not be capable to draw a best-fit line and it fails in such conditions.
Consider the below diagram which has a non-linear relationship and you can see the Linear regression results on it, which does not perform well and doesn’t come close to reality.
Non-linear relationship between dependent and independent variables we add some polynomial terms to linear regression to convert it into Polynomial regression.
In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x.
orange: 2018-2019
green: 2019-2020
pink: 2020-2021
purple: 2021-2022
brown: 2022-2023
under-fitting
.Two-year-to-date data distribution with a curve line was shown below.
under-fitting
.One year (2021-2022)
(0.9606)
and low RMSE (0.16)
, so we select the model with degree of 6
.Two Year (2020-2022)
5
with multiple r-square 0.92
and error 0.24
.By comparing the two datasets, two year-to-date data with the one year-to-date data, it shows that building RSV hospitalization rate model containing most recent one year data
creates a best prediction model
.
Model for RSV hospitalization rate from Nov, 2021
to Nov, 2022
is,
When we compare the actual hospitalization with the predicted value from our model, we can get the numbers as follows.
We can conclude that the model created is a good fit
. It is shown as a graph below.
We have got our model with the equation for the RSV hospitalization rate using last one year data:
Y = 0.917 + 0.312Week - 0.074Week2 + 0.0054Week3 - 0.00018Week4 + 0.0000027Week5 - 0.000000015Week6
rate of 9
could be reached at the beginning of next year.multiple R square =0.9606 and RMSE=0.16
). Also, next 3 month (11/14/2022-2/5/2023) RSV hospitalization rates were calculated.might be a better solution
to model the RSV hospitalization rates.