Trends and analysing the correlation of population density and percentage of population suffered from COVID19  A linear regression model
Kamlesh Garg^{1}, Aarushi Mathur^{1}, Surinder Kumar^{2}, Ruchika Nandha^{3}
^{1} Department of Pharmacology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India ^{2} Department of Emergency and Accident Services, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India ^{3} Department of Pharmacology, Dr. Harvansh Singh Judge Institute of Dental Sciences and Hospital, Panjab University, Chandigarh, India
Correspondence Address: Dr. Kamlesh Garg Room No. 607, 6^{th} Floor, Department of Pharmacology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi  110 029 India
DOI: 10.4103/kleuhsj.kleuhsj_232_21
BACKGROUND: Since the emergence in December 2019, coronavirus disease (COVID19) has impacted several countries and made it a worldwide pandemic. It is assumed that chances of transmission of infection of COVID19 are increased if the population of a particular area is dense as it is a highly contagious disease and measures like social distancing could not be followed. The objectives of this study were as follows: to compare the trend of confirmed, recovered, deceased cases and recovery and death rate of COVID19 (severe acute respiratory syndrome coronavirus 2) infection in the top 5 worsthit states of India with National Capital Territory of Delhi and to analyze the correlation of population density with percentage of population suffered. MATERIALS AND METHODS: This descriptive population study retrieved the data published by daily health bulletins of states and Press Information Bureau, Government of India. The correlational coefficient and linear regression analysis were used to analyze the relation between population density and percentage of population suffered from COVID19. RESULTS: Maharashtra continued to be the upmost Indian state with the highest number of confirmed, recovered, deceased cases and death rate. Further, it is estimated that population density has a negligible to low positive correlation (correlation coefficient value: 0.30) with the percentage of population suffered from COVID19 and there is no significant relationship with P > 0.05 between the above two parameters as obtained using linear regression model. CONCLUSION: The population density does not have a strong correlation with the percentage of population suffered from COVID19 in India.
Keywords: COVID cases, COVID19, percentage of population, population density
Introduction   
India is considered to be the top country in Asia continent and ranked number two globally among the worsthit countries by COVID19.^{[1]} The first human case of COVID19, the disease, subsequently named severe acute respiratory syndrome coronavirus 2 (SARSCoV2) was first reported by officials in Wuhan city, China, in December 2019.^{[2]} As on August 18, 2021, there were 32,321,395 confirmed cases of COVID19 in India and the top 5 worsthit states in descending order of number of cases are Maharashtra, Kerala, Karnataka, Tamil Nadu, and Andhra Pradesh.^{[2]} In earlier months, Delhi was among the top three worsthit states, but later on, its position has stepped down in terms of caseload of COVID19. In contrast, states such as Andhra Pradesh and Karnataka have shown worst growth in COVID19 cases in the last 1 month of the study period.
India was one of the earliest countries to declare a lockdown and that certainly gave India some time and opportunity to slow the spread. India is perhaps the only major country that was locked down before the virus took hold. The measures that target public behavior, including mandatory face covering and quarantining, are known as nonpharmaceutical interventions. One of the successes of the Indian government's pandemic policy lies in public health messaging. In between March 25 and June 30, the Prime Minister addressed the nation six times, urging people to be disciplined about COVID appropriate behavior and carried out rigorous campaigns, sending the public frequent reminders about wearing masks and maintaining social distance, thus creating awareness about the vaccines. Such messages helped in creating the herd effect across pharma, economic, health, and public safety sectors that enabled strict national lockdown.^{[3]}
India's high population density brings together another challenge in managing this widespread pandemic. The two metropolitan cities, New Delhi and Mumbai, have a population density of 29,259.12 and 73,000 per square mile, respectively, being some of the most densely populated cities in the world. Most of the residents of Mumbai live in confined houses located far away from their workplace, leading to long commuting time, resulting in prolonged exposure in public transit system. Comparatively, New Delhi covers a larger area than Mumbai. Social distancing in such densely populated cities along with 10%–15% of these cities' populations being illiterate coupled with cultural practices that facilitate gathering in groups might contribute to the emerging infectivity rate. For these dense communities in India, inadequate shelter and overcrowding are also some of the highrisk factors aiding in transmission of the virus. According to a recent report by the National Centers for Disease Control, unauthorized colonies and jhuggijhopri clusters pose a serious problem as a large number of people live in these colonies. Residents of these inadequate housing facilities usually lack access to adequate sanitation facilities, and selfisolation is often impossible.^{[4]}
There is a need to explore the various factors such as population, population density, and precautionary measures affecting the population suffered from COVID19 in some of the worsthit Indian states. Hence, we analyzed whether there is any correlation between the population density and the percentage of people suffered from COVID19 with the help of linear regression model. Thus, the objective of this study was to compare the trend of COVID19 in the top 5 worsthit states of India with National Capital Territory (NCT) of Delhi and to analyze the correlation between population density and percentage of population suffered from COVID19 in these states. The population, order of population density, and preventive measures of the 5 states may also be responsible for population density.
Objectives
 To compare the trend of confirmed, recovered, deceased cases and death rate and recovery rate due to COVID19 (SARSCoV2) infection in the top 5 worsthit states of India with NCT of Delhi
 To analyze the relationship of population density with percentage of population suffered from COVID19 in the top 5 worsthit states of India and NCT of Delhi.
Materials and Methods   
This was an observational descriptive populationbased retrospective study to compare the pattern of COVID19 (SARSCoV2) in the top 5 worsthit states of India, i.e., Maharashtra, Karnataka, Andhra Pradesh, Tamil Nadu, Kerala, and NCT of Delhi. This study retrieved, organized, and analyzed the data published by daily health bulletins of the government of these states and Press Information Bureau, Government of India, which is available in public domain.^{[2],[5]} In this study, we compared the number of confirmed, recovered, deceased cases and recovery and death rate in these states with Delhi from January 30, 2020, to August 18, 2021, on a cumulative basis. Further, the relationship of population density with percentage of population suffered from COVID19 in different states of India and NCT of Delhi was also statistically analyzed using correlation analysis and linear regression model. Institutional ethics committee permission and informed consent are not required in this study as individual participants are not involved in the research. Although NCT of Delhi is on the 8^{th} position as on August 18, 2021, being the capital of India, the trend of COVID19 in it is being compared with top 5 worsthit states in India.
Inclusion criterion
 Daily statewise cumulative data containing the number of confirmed, recovered, and deceased cases of COVID19 from January 30, 2020, to August 18, 2021
 Statewise population density in per sq. km.
Exclusion criterion
 Data of COVID19 from any other unauthorized sites/sources
 Data before January 30, 2020, and after August 18, 2021.
Statistical analysis
In this, we considered the statistical modeling, i.e., estimated the correlation coefficient followed by linear regression model between the percentage of population suffered from COVID19 with population density in worsthit states of India and NCT of Delhi and to know whether any correlation exists between these two variables. The purpose of correlation analysis is to provide information on the strength and direction of the linear relationship between two variables, while a simple linear regression analysis estimates the parameters in a linear equation that can be used to predict the values of one variable based on the other variable.^{[6]}
Correlation coefficient
The computation of the correlation coefficient® is the most commonly used method for analyzing the statistical relationship between two variables which essentially measures the degree of linear association. This is also called as the productmoment correlation coefficient or Pearson correlation coefficient. The value of r lies between −1 and +1. A value of the correlation coefficient close to + 1 indicates a strong positive linear relationship (i.e., one variable increases as the other variable increases). A value close to − 1 indicates a strong negative linear relationship (i.e., one variable decreases as the other increases). A value close to 0 indicates no linear relationship, however, there could be a nonlinear relationship between the variables.^{[7]}
Linear regression model
In this study, linear regression analysis was used to find out the effect of population density (independent variable) on percentage of population suffered from COVID19 (dependent variable). IBM SPSS Statistics 23 software (IBM Corp., located in Armonk (N.Y., USA) has been used to compute the mentioned concerned parameters from the data. We used the linear regression model which describes the dependent variable with a straight line that is defined by the equation,
Where x and y are the independent and the dependent variables, respectively, a is the slope, b is the intercept on the yaxis, and ε is the error with zero mean value. In the present study, population density is the independent variable (x), while the percentage of population suffered is the dependent variable (y).^{[8]} It is important to examine whether the association is genuine or not which can be done by considering the null hypothesis test. The null hypothesis states that there is no effect or relationship between the variables.^{[9]} We estimated the F value which compares the variances of the two variables and P value which is defined as the most important step to accept or reject a null hypothesis. Since it tests the null hypothesis that its coefficient turns out to be zero, i.e., for a lower value of the P value (< 0.05), the null hypothesis can be rejected otherwise null hypothesis will hold.^{[10]}
Results   
Confirmed cases
[Graph 1] shows the comparison of COVID19confirmed cases in NCT of Delhi with other top 5 worsthit states of India from January 30, 2020, till August 18, 2021, on a cumulative basis. Since the beginning, Maharashtra continued to hold the top spot and showed a sharp spike from February 2021 onward with maximum number of confirmed COVID19 cases of 64,06,345 till August 18, 2021, on a cumulative basis, whereas the trend in NCT of Delhi and other top 4 states of India was similar with 14,37,192 confirmed cases in Delhi on the same date.
Recovered cases and recovery rate
India has witnessed 31,517,510 recoveries out of 32,321,395 confirmed cases with recovery rate of 97.51% as on August 18, 2021. [Graph 2] presents the comparison of COVID19recovered cases in NCT of Delhi with other top 5 worsthit states of India from January 30, 2020, till August 18, 2021, on a cumulative basis. As observed, Maharashtra has shown the steepest rise in number of recoveries since the beginning from March 2020 whereas other states have a steep rise in number of recovered cases. Among the top worsthit states, Andhra Pradesh has shown the maximum recovery rate of 98.52%, followed by Delhi at 98.23% and Kerala having the lowest recovery rate of 94.73%, as shown in [Graph 3]. As per the latest data available, India continues to occupy the top global position as the country with the maximum number of recoveries.^{[11]}
Deceased cases and death rate
India overall presented a death rate of 1.34% due to COVID19. The comparison of the trend of COVID19deceased cases in NCT of Delhi with other top 5 worsthit states of India from January 30, 2020, to August 18, 2021, on a cumulative basis is presented in [Graph 4]. Maharashtra has shown a steep rise in number of deceased cases since May 2020. On the contrary, Andhra Pradesh is having a minimum number of deaths till August 18, 2021. Other states such as Karnataka, Tamil Nadu, Andhra Pradesh, and Delhi have shown a gradual increase in number of deaths from July 2020 onward, whereas Kerala presented with almost a flat curve till April 2021 and showed a little rise after that till August 18, 2021. Maharashtra is having a maximum death rate of 2.11% followed by Delhi, i.e., 1.74%, whereas Kerala is the state with the least death rate of 0.51% as presented in [Graph 5].
Relationship of population density and percentage of population suffered from COVID19
[Graph 6] shows a relationship of population density and percentage of population suffered from COVID19 in NCT of Delhi and other top 5 worsthit states of India from January 30, 2020, to August 18, 2021, on a cumulative basis. Although Maharashtra is having moderate population density with the highest number of confirmed cases, Kerala having maximum population density among the top 5 worsthit states presented with the highest percentage of population suffered from COVID19, i.e., 10.6%. Delhi having the maximum population density among these states is holding the second position in percentage of population suffered from COVID19, i.e., 7.12% next to Kerala. Thus, a statistical analysis is carried out to know the relation of population density and percentage of population suffered from COVID19 as these 2 states have highest values in terms of these two parameters. In the statistical modeling, the correlation coefficient is calculated followed by the linear regression analysis. The null hypothesis tests that there is no relationship between the population density and the percentage of population suffered from COVID19.The dependent variable, percentage of population suffered from COVID19 was regressed on predicting the independent variable population density to test the hypothesis or the effect. In the regression analysis, F (1,4) = .397 > .05, p = .563 > .056 as given in [Table 1]. The correlation coefficient value is .30 as given in [Table 2]. The value (.3) indicates low positive correlation to negligible correlation, which is further estimated by linear regression analysis as depicted in the equation (1). Since the P value is .563 > .05, which is considered nonsignificant, thus we failed to reject the null hypothesis and there is no relation between two variables under consideration and null hypothesis holds true. The scatter plot of percentage of population suffered by COVID19 against population density and the corresponding regression line and equation for the relationship between the variables for the stated states are depicted in [Graph 7].
In this analysis, the results indicate that population density did not play a significant role in predicting percentage of population to suffer from COVID19. Hence, there is a need to explore, which are the factors beyond population density leading to rise in COVID19 cases at a particular place.
Discussion   
In the present study, we compared the trend of COVID19 in the top 5 worsthit states of India with NCT of Delhi and analyzed the relation between population density and percentage of population suffered from COVID19 in these states presented in various graphs. In the initial months of its spread (April–August), three states, i.e., Maharashtra, Kerala, and Tamil Nadu, followed by Delhi held the highest number of COVID19 infections, but the trend witnessed a change soon enough. At present, Delhi has slipped down to 8^{th} position, from 3^{rd} position in AprilJune, 2020, nowadays, whereas Maharashtra continued to be the topmost in the number of (6,406,345) confirmed COVID19 cases with till August 18, 2021. Internationally, this trend in Maharashtra (India) can be compared with California (United States of America, USA) having consistent number of COVID19 cases from the outset of the pandemic and were 238 cases in the month of March, 2020 and reached 41,90,358 till 18^{th} August, 2021.^{[12]} Andhra Pradesh is the state that is represented with a maximum recovery rate of 98.52%, which can be compared, internationally to Texas (USA) having an 89.39% recovery rate. Maharashtra is having maximum death rate in India, i.e., 2.11%, which can be compared to New Jersey, New York, and Massachusetts (USA) having a 2.4% death rate. Kerala presented with a minimum death rate of 0.51% which is similar to Alaska (USA) with a 0.5% death rate.^{[13]} Out of these 6 Indian states, Delhi is having the maximum population density of 11,297 km^{2} followed by Kerala having an 859 km^{2} population density, but Kerala is having the highest percentage of population suffered from COVID19, i.e., 10.60%, followed by Delhi having 7.12% of population suffered from COVID19.
It is assumed that chances of transmission of infection of COVID19 are increased if the population of a particular area is dense as it is a highly contagious disease and measures like social distancing could not be followed. In the present study, we estimated the correlation coefficients to know whether there is any correlation between percentage of population suffered from COVID19 and population density, which is found to be 0.30, indicating negligiblelow positive correlation. The P values were computed to test the level of significance between these two variables using regression analysis. The P value was found to be .56> .05 as shown in [Table 3], which is more than the significant level, thus, we failed to reject the null hypothesis and there is no linear relationship between the two variables.
The results of this study are supported by the study done by Hamidi et al. at Johns Hopkins University of Public Health, USA, in which correlation between activity density and infection rate was found to be 0.280. The P value obtained by Structural Equational Modeling analysis is 0.874 > 0.05, which indicates that the infection rate increases with activity density, but the relationship is not statistically significant, possibly due to more adherence to social distancing guidelines.^{[14]} The results of this study can also be compared with the study conducted by Bhadra et al., in which they found a moderate positive correlation (0.49) between the COVID19 infection rate and the population density in 600 districts of 4 states (West Bengal, Maharashtra, Uttar Pradesh, ad Tamil Nadu) of India. This was further confirmed by regression analysis in which P value was significant at 0.000 level, because of which they rejected the null hypothesis.^{[15]} The results of this study are in contrast with the study conducted by Kadi and Khelfaoui, in which they found a positive correlation using Pearson correlation coefficient, i.e., 0.71, between population density and COVID19 cases in the country Algeria in North Africa. It was found that population density has a positive effect on increase in number of cases of COVID19 virus.^{[16]}
In this study and the abovecited studies, the correlation of COVID19 infection rate was analyzed with population density, but there might be many other factors such as testing rate, airport traffic, and high age groups which may be affecting the number of people getting infected in a particular area. Factors such as wind speed, total number of participants in major sports events, and GDP per capita were positively correlated with the numbers of COVID19 cases and deaths adjusted by the total population of cities (Spearman's correlation test, P < 0.05).^{[17]} In conclusion, the population density does not have a strong correlation with the percentage of population suffered from COVID19 in India. Therefore, there is a need of scientific insight for certain unexplored factors beyond population density leading to higher infection rate at a particular place.
Acknowledgments
The authors are thankful to Ms. Divya, Data Consultant, for the acquisition and retrieval of COVID19 data.
