A bibliometric analysis of Pubmed literature on coronavirus: All time period

In late December 2019, there are several reported pneumonia-liked cases with the new strain coronavirus in China. The World Health Organization then assigned this new disease with COVID-19. Coronavirus has been declared as the most responsible agent for a recent public health emergency (PHEIC) in early 2020. The need for further research regarding coronavirus is essential, considering its high threat of public health without any available specific antiviral or vaccine yet. The growth and development of coronavirus related research and thematic trends are still unknown. This study aimed to depict the bibliographic trend of coronavirus all time and pictured the coronavirus research patterns and dynamics throughout the years. Therefore, the objective of this study was aimed to generate a comprehensive bibliometric analysis of coronavirus infection, research topic dynamic and the development of Medical subject heading (MeSH). The study retrieved data from PubMed for the source. Pubmed is chosen because it is the biggest freely available health and medicine electronic database. The R software and Microsoft Excel used for the data analysis. For data visualization, it extracted from VOS viewer. The graph from VOS viewer used as a source for social network analysis.


Introduction
Coronavirus has been declared as the most responsible agent for the recent public health emergency (PHEIC) in early 2020. This virus mostly detected in the respiratory tract and gastrointestinal is a family of the virus responsible for outbreaks that occurred in 2002 and 2012. They were not considered to be highly pathogenic to humans until the outbreak of severe acute respiratory syndrome (SARS) as the coronaviruses that circulated before that time in humans mostly caused mild infections in immunocompetent people [1,2].
In 2002, there was the first case reported from Guangdong, China caused by species of coronavirus. It caused atypical respiratory disease known as Severe Acute Respiratory Syndrome-coronavirus (SARS-CoV). SARS-CoV was an animal virus that adapted to human-human transmission in the recent past. The presence of this animal reservoir implies that it is possible for this virus to again cross into humans and initiate disease outbreaks in the future [3].
Another similar pneumonia case reported in Saudi Arabia in 2012. It also caused by coronavirus strain and spread rapidly in the Middle-East region, so then it called Middle East Respiratory Syndrome-coronavirus (MERS-CoV). MERS-CoV sequences have been found in bats and in many dromedary camels. In humans, MERS attacks lower respiratory tract (LRT) involving fever, cough, breathing difficulties and pneumonia that may progress to acute respiratory distress syndrome, * Corresponding author: ssmputri@yahoo.co.id multiorgan failure, and death in 20 % to 40 % of those infected [4]. In late December 2019, there are several reported pneumonia-liked cases with the new strain coronavirus in China. The World Health Organization (WHO) then assigned this new disease with corona virus disease 2019 . It is linked with the seafood and livestock market in the city of Wuhan, China. The new disease spread more rapidly compared to SARS-CoV and MERS-CoV, within weeks COVID-19 already infected thousands of people in mainland China. As of July 30 th 2020, 17 540 901 confirmed cases and 677 924 deaths globally [5]. The cases found outside China are confirmed linked to travel history from Wuhan then spreading all over the world through local infection.
On March 2020, the WHO had declared the outbreak of COVID-19 as a global pandemic. The cases found outside China are proof that human-human transmission is possible and can be a threat to global health, especially greater risk to the countries with the weaker health systems. The diseases caused by the corona type virus from time to time resulted to outbreak and become a public health threat. The need for further research regarding coronavirus is essential, considering its high threat of public health without any available specific antiviral or vaccine yet. The growth and development of coronavirus related research and thematic trends are still unknown. This study aimed to depict the bibliographic trend of coronavirus all time and pictured the coronavirus research patterns and dynamics throughout the years. Therefore, the objective of this study was aimed to generate a comprehensive bibliometric analysis of coronavirus infection, research topic dynamic and the development of Medical subject heading (MeSH).

Data source
Data is collected from Pubmed. Pubmed is chosen because it is the biggest freely available health and medicine electronic database. It is devoted to biomedical sciences and is affiliated with several other National Library of Medicine (NLM) tools that can help optimize the analysis of biomedical subjects. It also provides Medical Subject Heading (MeSH), a professional indexing tool, whereupon adding a new article to Pubmed database, the article will be searched by experts for the main topics it discusses, and a list of MeSH will be assigned for each article [6]. These data extracted from Pubmed at July 20 th 2020. The entered query is "coronavirus"[mesh] AND "Middle East Respiratory Syndrome"[tiab] OR "Severe Acute Respiratory Syndrome" [tiab] OR "COVID-19"[tiab] OR "coronavirus"[tiab].

Data analysis
Medline format from PubMed used as the main data in this study. The R software used in the data cleaning process to retrieve the affiliation country of all authors, and determine the four publication periods (pre-SARS, SARS, MERS-CoV, and COVID-19). In the descriptive analysis step, Microsoft Excel used to explore the number of publications by years and journals. For data visualization, it extracted from VOS viewer. The graph from VOS viewer used as a source for social network analysis (SNA) to show network and flow within entities. Subanalysis is conducted by dividing dataset

Cleaning country method
The bibliography data from PubMed has one variable affiliation which was consists of affiliation information from each author separated by semicolons (; Then we conducted a text mining method in R to scan each variable. We used an R package "countrycode" in this step (the R code listed below). library(cou ntrycode) library(read xl) medline_afi ls <-read_excel("medline_afils.xls",gues s_max = 20000) for(n in names(medline_afils)){ medline_afils[paste(n,"adc", sep="")]<countrycode(medline_afils[[n]], 'country.name', 'country.name')} medline_afils$country <paste(toString(medline_afils[,109:2 12],sep = ";",na.rm=TRUE)) The result from the process above was the country name from each affiliation variable as a new variable. And the final step, we merge all new variables into one variable "country" separated by semicolons (;).

Social Network Analysis
SNA map was created using Vos Viewer software analysing bibliographic data extracted from Pubmed. Co-occurance analysis was performed for this study. The unit of analysis for pre-SARS, SARS and MERS were using 'MeSH keywords', meanwhile for COVID-19 period was using 'All keywords' which are contains of author keywords and MeSH keywords. Then we choosing minimum numbers of occurance to appear, it set the minimum bar of keywords to become one dot in the SNA map. The next step is verifying selected keywords. We eliminate irrelevant and general keywords that we exclude to analyze such as female,male, child, old, adult etc.
No IRB is required for bibliometric analysis due to no human subject were involved in this study.

Phase 2: SARS (2003-2012)
The total number in SARS period is 8 066 publications (17 %) within 10 years. Approximately there were 806 published article anually and 10 times more productive, a significant increasing number compared to the last period (pre-SARS). The publication trends is increasing until it reach the peak in 2003 by 1 471 publications, right after the SARS outbreak but the keep decreasing years by years until the end of SARS period.

Phase 3 : MERS (2013 -2019)
In 2013 the publication growth started to increase again with total of 623 publications. Total publications in the MERS period are 5.568 (12 %) publications. Annually, 795 articles published in this period. It showed that in the period after coronavirus outbreak always followed by raising number of publications, even for some years after. The average publications in both SARS and MERS are consistently above 700 publications/year, a very significant number compared to the period where coronavirus were not a threat to human.

Phase 4: COVID-19 (January 2020 -July 2020)
Meanwhile the COVID-19 group was analysed per month since its number is growing rapidly. Total of 28 752 publications within 7 mon. The fast growth of educational sharing is the impact of the great technology that now we have. There are lot of pre-print journals available make it easier for scientist to share their study. It is also the advantage in this outbreak situation when knowledge needs to be shared as fast as it can to help other scientists, medical practitioner and policy maker.

Distribution of publications by countries
The country contribution on the publications was extracted from the country origin of the first author. If we see from the country contribution, ( Table 1)

Distribution of publication by journals
Meanwhile, all-time most productive journal publisher ( Table 2) is Journal Virology. Journal Virology is part of American Society for Microbiology (ASM) journals. This journal covers the updated research on the nature of viruses, its scope including structure and assembly, genome replication and regulation of viral gene expression, genetic diversity and evolution, virus-cell interactions, cellular response to infection, transformation and oncogenesis, gene delivery, vaccines and antiviral Agents and pathogenesis and immunity [8]. The United States is on the top list both in most productive country based on author origin and the most productive country based on journal publisher origin

Evolution of research topics by Medical Subject Heading
Coronavirus and coronavirus infection both indexed in the Pubmed since 1994. According to the MeSH, coronavirus is a member of coronaviridae which causes respiratory or gastrointestinal disease in a variety of vertebrates [7]. And coronavirus infection is virus diseases caused by the coronavirus genus. Some specifics include transmissible enteritis of turkeys (enteritis, transmissible, of turkeys); feline infectious peritonitis; and transmissible gastroenteritis of swine (gastroenteritis, transmissible, of swine) [8] Social network analysis in (Figure 3) displayed co-occurrence of MeSH in journal articles from 1949 to 2002. Four clusters of co-occurrence MeSH were identified. Red cluster is the biggest one shows coronaviridae, antibodies viral, antigen viral, swine, coronaviridae infection and remaining. Second biggest, the blue cluster showed the most keyword use are murine hepatitis virus, coronavirus infection, virus replication and the remaining. The green cluster is dominated by molecular sequence data. Meanwhile, the smallest yellow cluster informed that 'viral envelope protein' is the most discussed topic. Overall the publications in the Pre-SARS period dominated with biomolecular research. The closer distance between nodes and or thicker edge connecting nodes indicate the higher intensity of co-occurrence. From the SARS era (Figure 4), the SNA showed an obvious co-occurrence MeSH as shown in the biggest green cluster. It is lead with severe acute respiratory syndrome and followed with disease outbreaks, Hongkong, China, global health etc. The yellow cluster, as the second informed that the main MeSH is sars virus. Another cluster, the red cluster is dominated by coronavirus infection, molecular sequence data, and coronavirus. The smallest blue cluster mostly consists of molecular aspects such as antibodies, spike glycoprotein, membrane glycoproteins, and viral envelope proteins. In this period, the publication starts to change with additional terms emerges such as Hongkong, SARS and global health. The pattern not only focused on biomolecular study but starting to concern on epidemiology and global health  In the SNA from 2013 to 2019 (Figure 5), shows the co-occurrence of the MeSH that divide into four big clusters. This period is when the MERS outbreak in 2013, supposedly the MeSH had changed from SARS to MERS. The biggest red cluster has a prominent of coronavirus infection followed by middle east respiratory syndrome, respiratory tract infection, human, severe acute respiratory syndrome and remaining. The SARS dot is still visible but smaller than MERS dot. It means that the publications are mentioned MERS more frequently. The green cluster consists of chlorocebus aethiops, cell line, antibodies, viral vaccines, etc. The blue cluster consists of infectious bronchitis virus, poultry diseases, phylogeny, genome and several remaining. And the smallest yellow cluster consists of swine, swine diseases, diarrhea, and feces.  The social network analysis ( Figure 6) shows the pattern of article keywords in 2020. Instead of using MeSH, we used keyword to visualized data considering the recent article might not yet indexed into MeSH. The biggest cluster portrayed the most used keyword among the published articles, which is 'coronavirus infection'.
The most mentioned MeSH (Figure 7)

Discussions
All-time analysis of coronavirus publication trends is increasing globally. Especially from the first year of publication to 2002. At this period coronavirus was not a deathly disease, mostly found in the animals and if it was found in the humans it only caused mild symptoms. The first outbreak of coronavirus was SARS in 2003, the second outbreak was MERS in 2012 and recently COVID-19 outbreak. It is known that every outbreak was a new type of coronavirus that had not been identified before. Furthermore, all of those outbreaks are coming from animals to humans infection then continue spreading as a humans-humans transmission. After the outbreak period, the publication growth rate is increasing significantly. Most publications regarding coronavirus and coronavirus infection are dominated by the USA and followed by China. Generally, all publications from 1995 to 2016 extracted from PubMed also showed that the USA and China are two leading countries for health science publications. Research publications from the United States (USA) showed a steady rise and a doubling of publications in the 20-year review period 11 . Aside of the domination from those two countris, in each period, other countries where the first case emerged and most affected tend to have higher number of publication. From the MeSH term analysis, keywords related to coronavirus infection appeared mostly after the outbreak in 2002. Before the outbreak, MeSH term keywords that mostly appeared were related to biomolecular and virology of the coronavirus. After the first outbreak, the keyword trend is shifting to diseases-related. The top ten most mentioned MeSH keywords that are close related to vaccine are immunology, virology and isolation&purification. Term 'vaccine' itself is not in top ten list. But it only reflect the bibliometric until before COVID-19 since it may not already indexed in MeSH.

Conclusion
The growth of coronavirus publications worldwide is improving significantly, especially in most infected countries such as United States, China and Italy. Evolution of research topic are changing overtime. After the first outbreak, the keyword trend is shifting to diseases-related. Specific terms for 'vaccine' is not appeared frequently, but another terms related to vaccine appeared quite often. This term appearance in COVID-19 period is not seen yet due to limitation on MeSH analysis, since many recent publications in 2020 have not been indexed in MeSH. We assume in the coming years, publications related vaccine will appear in bibliometric study. More studies about coronavirus vaccines and virus evolution are highly suggested.
Study found that using PubMed database can prove the growth of research related to coronavirus. Massive growth in 2020 can be seen from total of 61 % of all coronavirus publications were conduct only in this year. The fast growth of educational sharing is the impact of the great technology that now we have. There are lot of pre-print journals available make it easier for scientist to share their study. However, this can raise questions about the quality of the article due to the brevity of the research and review processes.

Limitation
This study using only retrieved bibliography data from Pubmed database. Currently Pubmed has new feature called LitCovid, a curated literature hub for tracking upto-date sciencetific information about COVID-19 which we did not analyse in this study.