Mining natural products related to paclitaxel reveals the possible biosynthetic pathway of paclitaxel

. Paclitaxel is a widely used anti-tumor drug. Currently, paclitaxel can only be extracted from plants or synthesized by chemical semi-synthesis, which cause environmental damage and cannot meet the growing demand. However, the complete biosynthetic pathway of paclitaxel is still not clear, which greatly limits the production of paclitaxel using methods such as synthetic biology. Here, we deduced the paclitaxel biosynthetic pathway by searching all possible intermediates in the paclitaxel synthesis pathway from the natural product databases. In addition, we performed the transcriptome sequencing of Taxus brevifolia and performed co-expression analysis of the identified genes in the paclitaxel synthesis pathway. All these results laid a solid foundation for the elucidation of paclitaxel biosynthetic pathway.


Introduction
Taxol (paclitaxel) is a highly effective antitumor drug, which is accumulated in the bark of Taxus brevifolia. The diterpenoid alkaloid Taxol is widely used for clinical treatment of various cancers, such as ovarian cancer, breast cancer, non-small cell lung cancer, and so on (Wani et al., 1971;McGuire et al., 1989;Rowinsky et al., 1990;Holmes et al., 1991). The contradiction between the market supply and demand of paclitaxel is prominent. The content of paclitaxel in the bark of Taxus chinensis is the highest, which is only about 0.06%, which largely limited the development of paclitaxel (Wani and Horwitz, 2014). The natural extraction of paclitaxel from plants has led to the devastating destruction of Taxus resources. Therefore, it is urgent to develop new ways to obtain paclitaxel.
At present, the main methods for obtaining paclitaxel are chemical synthesis, chemical semi-synthesis, largescale plant cell line culture, and endophytic fungi synthesis. The chemical synthesis route of paclitaxel is complicated, the reaction conditions are difficult to control, and the synthesis rate is very low. So far, the shortest total synthesis route of paclitaxel in the world have been achieved through 21-step chemical reactions, and the total yield is only 0.118% (Li et al., 2021). The semi-synthesis of paclitaxel is as follows: the first step is extracting the intermediates of taxanes, such as 10deacetylbaccatin Ⅲ and baccatin Ⅲ from the branches and leaves of yew. And then chemical synthesis is used to produce paclitaxel (Li et al., 2015;Liu et al., 2016). Paclitaxel obtained by this method is mature with high purity and low cost. This method is the main method for industrial production of paclitaxel at present, however, the * lium@tsinghua.edu.cn production of paclitaxel is still limited by plant resources. Some Taxus brevifolia in-vitro cultured cells can produce paclitaxel. It is reported that the content of paclitaxel of the suspension cultured cells of Taxus wallichiana in the airlift bioreactor can reach 20.84 mg/L in 24 d (Navia- Osorio et al., 2002). In 1993, the first taxol-producing endophyte was isolated from Taxus brevifolia (Stierle et al., 1993). So far, more than 20 kinds of endophytic fungi have been reported to produce paclitaxel, but their yields are very low (Ji et al., 2006). In recent years, many important medicinal natural products such as artemisinin and ginsenosides have been successfully synthesized with high yield in heterologous expression systems (Ro et al., 2006;Yan et al., 2014). Taxadiene, the important precursor of paclitaxel, was achieved high-level accumulation (1 g/L) by means of multifunctional module optimization in Escherichia coli (Ajikumar et al., 2014). The genetically modified Saccharomyces cerevisiae also produced taxadiene at a productivity of 8.7 mg/L (Engels et al., 2008).
The biosynthesis of paclitaxel can be divided into three steps. First, isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) can produce the diterpene precursor geranylgeranyl diphosphate (GGPP) (Eisenreich et al., 1996). Subsequently, GGPP is cyclized to produce taxadiene, and then taxadiene undergoes a series of modifications on the skeleton to form baccatin III . Finally, the C13 position of the baccatin III is acylated with the phenylisoserine side chain, and then paclitaxel is produced by the further hydroxylated and benzoylated at the C2' and C3' positions of the side chain, respectively (Kaspera and Croteau, 2006;Onrubia et al., 2011). The paclitaxel biosynthetic pathway is shown in Figure 1. The whole biosynthetic pathway of paclitaxel is quite complex, requiring about 19 enzyme-catalyzed reactions from the substrate GGPP . To date, 13 enzymes in the paclitaxel pathway have been characterized, including a taxadiene synthase (TS), five P450s, five acyltransferases and two extra enzymes ( Table 1). Until now, several steps in the paclitaxel biosynthetic pathway remain to be identified, such as C1 oxidation, C9 oxidation and oxetane. In addition, the reaction orders on the taxadiene skeleton are not clear.
Gene co-expression analysis is useful for mining genes in paclitaxel metabolic pathway. However, the intermediates in paclitaxel metabolic pathway are unclear, which greatly hinders the elucidation of the paclitaxel metabolic pathway. To provide a clear paclitaxel metabolic pathway, we investigated the intermediates for paclitaxel. We also performed transcriptome analysis and found the candidate genes co-expressed with the identified genes in paclitaxel pathway.

Results
With the cloning and identification of related hydroxylase genes in the synthetic pathway of paclitaxel, a breakthrough is expected in the production of paclitaxel by synthetic biology. However, the order of the modification reactions occurring on the taxadiene backbone is still unclear. This greatly limits the mining and identification of candidate catalytic enzymes in the paclitaxel synthesis pathway, because it is possible that some enzymes in the paclitaxel synthesis pathway have higher substrate specificity. Therefore, it is necessary to determine the catalytic reaction pathway of paclitaxel biosynthesis, including the specific intermediate According to the 28 natural products of paclitaxel synthesis pathway, a synthetic paclitaxel pathway via taxa-4(20),11(12)-dien-5α,13α-diol and 10deacetylbaccatine III was speculated (Figure 3). The results showed that many enzymes in paclitaxel synthesis pathway are promiscuous. The substrate specificity of some enzymes in paclitaxel pathway is not very high, and they can convert multiple intermediate metabolites in the paclitaxel synthesis pathway to different products, for example T10βH, T2αH, TAT and so on.
In order to mine the candidate enzyme genes in the paclitaxel synthesis pathway, transcriptome analysis of the bark and leaves of Taxus brevifolia was performed. The differentially expressed genes (DEGs) between bark and leaves of Taxus brevifolia were analyzed. About 97% and 92% of the clean reads had quality scores at the level of Q20 and Q30, respectively. A total of 50,919 genes were annotated in Taxus brevifolia Genome, and 8,770 different expression genes were obtained in bark and leaves. Among them, 5,661 genes were up-regulated and 3,109 genes were down-regulated in leaves, compared to that in bark (Figure 4). The GO enrichment analysis of different expression genes showed that more genes in the leaves were rich in oxidation-reduction process and catalytic activity, compared to that in bark ( Figure 5). KEGG pathway analysis showed that there were biosynthesis of secondary metabolites and metabolic pathways accumulated in leaves and bark ( Figure 6).   Some studies have shown that the genes on the same metabolic pathway in plants generally have a coexpression trend in the expression level changes after being stimulated by in vitro or in vivo signals. Through this feature, the transcriptome data can be screened for target catalytic key enzymes. In this study, co-expression analysis of 649 annotated P450 genes and the identified genes (from taxadiene to paclitaxel pathway, including some identified P450) in paclitaxel biosynthesis pathway was performed. The co-expressed analysis showed that the top 50 co-expressed genes of TS1 contained T5αH1, T5αH2, T5αH3, T10βH2, T13αH1, T13αH2, T2αH and TAT3; the top 50 co-expressed genes of T5αH1 contained T5αH2, T10βH1, T10βH2, T13αH2, TAT3, BAPT1 and BAPT2; the top 50 co-expressed genes of T13αH1 contained TS1, T5αH2, T13αH2, T2αH and TAT3 (Figure 7). All the co-expressed genes of the identified genes in paclitaxel biosynthesis pathway were showed in Supplementary Table 1. All these results suggested that the co-expressed genes of the the identified genes could be the candidate genes for paclitaxel biosynthesis.

Discussion
The content of some natural products such as paclitaxel are very low in plants, which cannot meet the increasing demand. However, the biosynthetic pathway of paclitaxel has not yet been completely elucidated. Investigation of the effective hydroxylated reaction from taxadiene to paclitaxel is difficulty and hot spot of the research of paclitaxel biosynthesis. This is also the bottleneck for producing paclitaxel using synthetic biological methods. The speculation of paclitaxel biosynthesis is essential to predict and verification of key catalytic enzymes in the biosynthetic pathway. By searching for intermediate metabolites in the natural product database, the synthetic pathway of paclitaxel is speculated. The co-expression analysis based on transcriptome sequencing is the main method for mining the candidate genes in the biosynthetic pathway of natural product. The prediction of paclitaxel synthetic pathway and the mining of the candidate genes can more effectively elucidate the synthetic pathway of paclitaxel. For example, from the perspective of speculative paclitaxel synthetic pathway, compounds C434 and C438 may be the substrates of T9H, so coexpression of the upstream enzyme T13αh and T10βH, which produce these two compounds, can be used to find the candidate T9H genes. Next, in vitro enzymatic assay, or the in vivo verification by transforming the candidate genes into yeast chassis cells with high yield of the related substrates. In general, the prediction of paclitaxel synthetic pathway and the co-expression analysis of the identified enzymes in the pathway provide a new idea for the analysis of the paclitaxel biosynthetic pathway.

Methods
Speculation of the biosynthetic pathway of paclitaxel The total 2700 possible intermediate metabolites for paclitaxel formation were constructed using ChemDraw and saved as smiles format. These intermediate metabolites of paclitaxel synthesis pathway were further searched in natural product databases such as MassBank, Coconut, Spektraris NMR, Super Natural II.
RNA extraction and transcriptome sequencing Fresh Taxus brevifolia leaves and bark (3 replicates each) were stored in liquid nitrogen, and total RNA was extracted using Trizol. The mRNA-seq library was constructed and sequenced by PE150 using the Illumina Novaseq6000 platform of Beijing Berry Genomics Corporation.
RNA sequencing data preprocessing and coexpression analysis The raw data obtained by sequencing was filtered, and Clean reads removed the reads with adapters, duplicates, and low quality. Clean reads were aligned using bowtie2 software to remove rRNA. The rRNA-removed reads were aligned with the Taxus reference genome(https://figshare.com/articles/dataset/contigs_of_ taxus_genome/15000672) using Hisat2 software. Gene expression quantification (RPKM) analysis was performed using HTSeq 0.6.1. The Illumina reads data was analyzed for quantifying gene expression levels by using fragments per kilobase per-million mapped fragments (FPKM). Co-expression analysis was performed using Pearson correlation coefficients of FPKM. GO enrichment analysis of the differentially expressed genes was performed using topGO software, through counting the number of genes in each GO term that are significantly enriched and performing the secondary classification statistics on them (the top 20 terms of each type was selected). KOBAS (v3.0) software was used for KEGG enrichment analysis, through selecting the top 20 significantly enriched pathways for statistics.