CADD
STRUCTURAL BIOLOGY
A few more details
The structural biology team of Elettra Sincrotrone Trieste involved in the WP4, started to work in April 2020, anticipating the planned activities by about 5 months with respect to the original schedules. The initial effort has been dedicated to protein expression, purification and sample characterization with crystallization screening set-up.
The SARS-CoV-2 Mpro was successfully expressed in E.coli and purified at homogeneity (> 98% purity) starting from the expression vector kindly donated by L. Hilgenfeld (Lubeck University – DE). Initial crystallization screenings have been set up based on published data and using commercially available crystallization kits. The most promising conditions have been optimized and “flower-like” shape crystals were reproducibly obtained. These crystals were cryo-preserved and tested at the XRD2 beamline of the Elettra synchrotron, showing a diffraction pattern with a resolution in the range of 1.6-2.0 Å, with the best resolution achieved at 1.52Å. Datasets were processed and a quick MR solution was done using pdb 6W63 as a starting model showing either P21 or C2 space group, having a monomer or a dimer in the asymmetric unit, respectively, but with almost identical conformation. Further optimization of the crystallization trials has established also a seeding based protocol that grew crystals of space group P212121. For each space group we obtained the crystal structure of Mpro in APO form and the coordinates have been deposited in PDB with code 7ALI (https://www.rcsb.org/structure/7ALI, sp.gr. P21), 7ALH (www.rcsb.org/structure/7ALH, sp.gr. C2) and 7BBE (www.rcsb.org/structure/7BB2, sp.gr. P2(1)2(1)2(1)).
Xray diffraction studies on crystals of Mpro bound to specific inhibitors have been performed. A number of small molecules selected from a biochemical screening of 8700 repurposing drugs have been tested by co-crystallization and soaking techniques. So far 8 structures have been deposited in PDB describing the binding modes of myricetin, Ebselen, SU3327, MG-132 in absence and in presence of DTT. Binding mode of myricetin, a flavonoid, has been clearly identified in the crystal structure (Fig. 4) showing a covalent linkage with the catalytic cys145. The results have been published in two papers confirming the findings of the biochemical screening [Kuzikov et al. 2021; doi.org/10.1021/acsptsci.0c00216] and validating the outputs of virtual screening and molecular dinamics [Gossen et al. 2021; https://doi.org/10.1021/acsptsci.0c00215]. The coordinates of the crystal model are deposited in PDB with code 7B3E (www.rcsb.org/structure/7B3E). All structures in PDB have been released immediately and are openly available.
Deposited PDB:7ALH, 7AL, 7B3E, 7BB2, 7BE7, 7BFB, 7BGP, 7NBY, 7NF5, 7NG3, 7NG6
PRODUCTION AND TUNING ON HPC INFRASTRUCTURE
A few more details
To promote agile and portable software development needed to guarantee continuous releases of updated functionalities and improved elaborated throughput of the EXSCALATE software platform, we encapsulated all the geometrical docking, issued in a stand-alone library called LiGen GeoDock, with a simple, stable, and well-defined non-virtual interface.
Figure 1 shows an overview of the LiGen GeoDock library, which aims at docking a ligand in a target pocket using geometrical information only. To better encapsulate Geodock in the EXSCALATE platform, we designed a single public interface hiding implementation details. In this way, we can improve GeoDock without hindering the development of the other components. The E4C project targets a heterogeneous node composed of GPUs and CPUs. The previous C++ code version was unable to harness the full computational power of the new CINECA Marconi100 node based on CPUS only. To overcome this limitation, we implemented the docking algorithm using the OpenACC pragma-based language and the PGI OpenACC compiler to generate device-specific code while maximizing the code portability. However, given the transition to the Marconi100 supercomputing cluster at CINECA, we then implemented the docking algorithm using the CUDA C/C++ language to obtain the maximum performance from of the NVIDIA Tesla V100 GPUs.
We ran an initial experimental campaign to assess the ligand library performance with a wide range in the number of atoms and rotatable bonds. The code peak performance using only the 2xIBM-Power9 and 128 software threads reaches a throughput of 30K optimal ligand poses per second, while the exploitation of the full node including the 4xNvidiaV100 GPUs reaches a throughput of more than 250K optimal ligand poses per second.
These performance results obtained so far on the new CINECA Marconi100 partition are very promising, and they are a good starting point for further tuning of the LiGen GeoDock library. As next steps, we envision pushing further on a fine-grain refinement and adopting dynamic autotuning approaches, combined with the entire machine-level scaling.
CADD
A few more details
Reported below is the dataset containing viral proteins, selected for MD simulation studies, and that contains Active Interest Proteins and Low-Interest Proteins:
M-Protein | N-Protein | |
Nsp2 | Nsp3 | |
Nsp4 | Nsp5 - 3CL-PRO | |
Nsp6 | Nsp7-Nsp8 - HETEROMER | |
Nsp9 | Nsp12 - MONOMER | |
Nsp12-Nsp7-Nsp8 - HETEROMER | Nsp13 - HELICASE | |
Nsp14 - MONOMER | Nsp10-Nsp14 - HETEROMER | |
Nsp15 | Nsp16 - MONOMER | |
Nsp10-Nsp16 - HETEROMER | ORF3a | |
ORF6 | ORF7a | |
ORF8 | ORF10 | |
PL-PRO | Spike-ACE2 | |
Spike |
Among these 25 unique structures, our studies have used both Homology and Experimental models, by increasing the number of overall structures to 37. All the MD simulations, carried out on HPC5 and Galileo clusters, yielded by ENI and CINECA respectively, are ongoing, and 27 systems already reached at least 1 microsecond, and among these, some have reached or exceeded 2 microseconds. We are pushing the simulation times towards 10 micro-seconds. In particular, the 3CL-PRO was simulated in its DIMERIC and MONOMERIC forms, to better understand which are the most important structural differences. The analysis carried out, with the work-flow that will be explained in the next paragraphs, allowed acquisition of useful information that will be collected in a scientific work.
Due to the high interest of the scientific community on this target, we have already produced a manuscript entitled “Computational Studies of SARS-Covid2 3CLpro: Insights from MD Simulations”, that will be submitted to the International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis” in a few days. In this paper we discuss the main differences coming from the analysis of the whole protein structural behavior and those seen in the binding site. The web address of the repository will be communicated shortly. It will be possible to download the trajectories of the simulations discussed in the work.
Papain-like proteinase (PL-PRO): Responsible for the cleavages located at the N-terminus of the replicase polyprotein. In addition, PL-PRO possesses a deubiquitinating/deISGylating activity and processes both 'Lys-48'- and 'Lys-63'-linked polyubiquitin chains from cellular substrates. Participates, together with nsp4, in the assembly of virally induced cytoplasmic double-membrane vesicles necessary for viral replication. In the video, the PL-PRO is shown in green highlight.
Nsp12-7-8: A key component, RNA-dependent RNA polymerase [RdRp, also known as nsp12], catalyzes the synthesis of viral RNA, and therefore plays a central role in the replication and transcription cycle of the COVID-19 virus, possibly with the help of nsp7 and nsp8 as cofactors. In the video, the nsp12/nsp7/nsp8 hetero-oligomeric complex is shown in highlight. Nsp12, nsp7 and nsp8 are shown in blue, red and green respectively.
<Nsp13: Scientists suspect that nsp13 unwinds so that other proteins can read its sequence and make new copies. This protein, called Helicase, is a multi-functional protein with a zinc-binding domain in the N-terminus displaying RNA and DNA duplex-unwinding activities with 5' to 3' polarity. Activity of helicase is dependent on magnesium. Here, the protein is reported in blue highlight.
<Nsp15: This enzyme is a specific endoribonuclease with a C-terminal catalytic domain, belonging to the EndoU family. EndoU enzymes are present in all animal kingdoms, where they perform various biological functions associated with RNA processing. Researchers suspect that this protein cuts the residual virus RNA as a way of hiding from the antiviral defenses of the infected cell. The protein, in its hexameric form, is shown in the video shows in highlight style, and each monomer composing the hexamer has a different color.
Spike receptor-binding domain (RBD)/ACE2: Dynamic structure of the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2 bound to the cell receptor ACE2. Coronaviruses use the spike glycoprotein on the envelope to bind to their cellular receptors. Such binding triggers a cascade of events that leads to the fusion between cell and viral membranes for cell entry. The video shows in highlight, the SARS-CoV-2 RBD core in slate and ACE2 in red.
CADD
A few more details
Such an approach is implemented by the Pockets2.0 plugin for the VEGA suite of programs. The VEGA suite comprises a graphical interface with a new version of the plug-in for FPocket (named Pockets2.0), a well-known software used to detect protein cavities, based on an optimized algorithm for Voronoi tessellation. For a better exploration of the protein cavities, this combines the already implemented cavity mapping, as performed by Fpocket, with docking calculations with probe molecule(s) using AutoDock/Vina or PLANTS docking programs. To optimize the ranking of the explored cavities, Pockets2.0 can utilize both Fpocket and docking scores by calculating customizable consensus scores. The combination of the FPocket and docking scores by calculating customizable consensus scores leads to a significant increase of the correctly identified binding sites compared to the FPocket and docking scores alone, and this enhancement appears to be truly relevant when analyzing complex proteins with rather narrow binding pockets, and in particular, for characterizing allosteric binding sites.
The scientific paper entitled “A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins by combining pocket and docking searches as implemented in Pockets2.0”has been submitted to International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis”. The web address of the repository will be communicated shortly. It will be possible to download all the structural data discussed in the work.
The images below show the process of the viral protein mapping (Figure 1) and the identification of the binding pocket (Figure 2). In particular, figures 1 and 2 represent the homologic 3D structures of the viral protein nsp13 and the heteromer nsp14-nsp10 respectively, generated in the deliverable D1.1. This process underlines the importance of collaboration among the activities carried out by each partner.
CADD
A few more details
The SARS CoV-2 3CL-Protease was analyzed and used to fine tune the virtual screening strategies. SARS CoV 3CL-Protease known inhibitors were collected from literature to define training sets for our studies, keeping in mind that the pocket of SARS-Cov and SARS-Cov-2 proteases are very similar. Different docking algorithms were combined to increase the quality of the virtual screening campaign.
Five consortium partners (Dompé, University of Milan, and the three supercomputing centers, CINECA, Barcelona Supercomputing Center (BSC) and Jeulich Forschungszentrum (FZJ)), made their hardware and software facilities available to run docking simulations, including LiGen, PLANTS, PELE, Glide and FRED. The virtual screening protocols were validated in terms of Enrichment Factor, setting the conditions that allow most of the training set compounds to be ranked in the top-scored positions. Subsequently, the molecules with the best docking scores, found in the top 1% of the screened library, were selected with each software, and a consensus approach of the different docking scores was used to rank the most promising safe in man drugs acting on 3CL-PRO viral protein (Figure 1). To further improve the quality of the results, more accurate methods were applied, such as induced-fit simulations on SARS-CoV known inhibitors with PELE software, and a regression analysis model to correlate predicted and in vitro activities was built. We obtained a clear correlation of higher IC50 with larger residence time in the active site, which was analyzed in terms of (1) the solvent-accessible surface area of the ligands, (2) the sub-pocket population analysis. Both analyses showed high potential for a more accurate prediction, opening the possibilities, for example, to a second refinement effort after an initial docking campaign.
We are now using the tuned and validated virtual screening protocols to screen the drug repurposing library against the best selected 3D structures of the other SARS-Cov-2 proteins, either already experimentally obtained (PDB) or derived by homology modelling techniques, and processed by molecular dynamics simulation (Figure 2).
We currently have validation compounds coming from the literature, and results coming from our docking approaches that identified different sets of molecules. We obtained a critical assessment of different approaches and results that we can get from combining the best of the techniques we have available and that we are putting in place, for the identification of the most promising safe in man drugs that are ready for immediate treatment of the infected population.
DISSEMINATION & EXPLOITATION OF THE FOREGROUND
DISSEMINATION & EXPLOITATION OF THE FOREGROUND
REGULATORY CONTRACTS
PROTEIN PRODUCTION AND TARGET BASED ASSAYS
PROTEIN PRODUCTION AND TARGET BASED ASSAYS
GENOMICS
A few more details
Here, we aim at analyzing the effect of evolutionary constraints in shaping the organization of the spike protein family across the β-CoV lineage. To this aim, we use a multivariate analysis protocol for unsupervised detection of both the protein family segregation and the residue positions that better explain the sources of variation of the family, i.e. Specificity Determining Positions (SDPs).
The analysis was performed both at the level of the full β-CoV genus and individual subgroups. While in the first case the results are consistent with the known phylogenetic classification of Betacoronavirus into five subgenera (Fig. 1C), the analysis performed on the individual β-CoV groups allowed a fine-grained separation into clusters that clearly reflect the functional diversification of the spike protein family, that is, the specificity to different host-cell receptors. Indeed, both the clustering and domain enrichment results of the identified SDPs consistently reproduce the known cell receptor specificities observed across the different β-CoV lineages. At the level of the Sarbecovirus group, for example, both SARS-CoV-2 and RaTG13 are clustered together with SARS-CoV and other SARS-like sequences from bats (Fig. 1A,D), reflecting the ability of these members of the Sarbecovirus group to bind the ACE2 cell receptor. Notably, the proximity of SARS-CoV-2 and RaTG13 and other SARS-CoV sequences based on the SDPs is different to what observed based on phylogenetic analysis, where they form distinct clades.
Collectively, our results point to a key role of the SDPs in mediating host cell receptor specificity across β-CoVs and provide, at the same time, a framework for monitoring the evolution of the SARS-CoV-2 specificity to hACE2, as well as the emergence of novel potential cross-species transmission events. As such, it is worth mentioning that from the analysis of amino acid variations across the circulating SARS-CoV-2 virus, SDPs found in the RBD tend to mutate with a very low frequency, similar to what is seen at ACE2 contacting sites. This is of relevance, as our results suggest that mutations in SDPs can significantly impact the receptor-binding ability of the spike. In line with this reasoning, it is important to notice that other members of the Sarbecovirus group might have the potential to acquire ACE2 binding ability, as they share substantial similarity in terms of SPDs. This is especially the case of members of the Sarbecovirus clade 3, which despite being phylogenetically distant to SARS-CoV-2, display identical residues in 10 out of 14 SDP positions within the RBD, making them potential candidates for new human infections.
In summary, our results show that the identification of evolutionary patterns based on the analysis of sequence information alone can provide meaningful insights on the molecular basis of host-pathogen interactions and adaptation. Both the methodology and results obtained in this work lead the basis for a follow-up study where, in collaboration with other members of the E4C consortium, we aim at identifying potential routes of mutations that could lead to new β-CoVs adaptation to human hosts, and ultimately contribute to better understanding and monitoring of events that are critical to public health concerns worldwide.
1Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J. 2021;19:759-766. doi: 10.1016/j.csbj.2021.01.006
In parallel, continuous monitoring of the virus evolution is performed based on the analysis of the SARS-CoV-2 genomic sequences from public repositories. Variation analysis is performed based on the established SARS-CoV-2 NC_045512.17 as a reference genome, where amino acid variations are annotated for all major coding regions as well as alternative reading frames and mutation frequencies computed globally and by different countries at different time points. Data are continuously updated as new sequences become available and shared with the partners and the community via the ViralSeq website
CADD
CADD
A few more details
Reported below is the dataset containing viral proteins, selected for MD simulation studies, and that contains Active Interest Proteins and Low-Interest Proteins:
M-Protein | N-Protein | |
Nsp2 | Nsp3 | |
Nsp4 | Nsp5 - 3CL-PRO | |
Nsp6 | Nsp7-Nsp8 - HETEROMER | |
Nsp9 | Nsp12 - MONOMER | |
Nsp12-Nsp7-Nsp8 - HETEROMER | Nsp13 - HELICASE | |
Nsp14 - MONOMER | Nsp10-Nsp14 - HETEROMER | |
Nsp15 | Nsp16 - MONOMER | |
Nsp10-Nsp16 - HETEROMER | ORF3a | |
ORF6 | ORF7a | |
ORF8 | ORF10 | |
PL-PRO | Spike-ACE2 | |
Spike |
Among these 25 unique structures, our studies have used both Homology and Experimental models, by increasing the number of overall structures to 37. All the MD simulations, carried out on HPC5 and Galileo clusters, yielded by ENI and CINECA respectively, are ongoing, and 27 systems already reached at least 1 microsecond, and among these, some have reached or exceeded 2 microseconds. We are pushing the simulation times towards 10 micro-seconds. In particular, the 3CL-PRO was simulated in its DIMERIC and MONOMERIC forms, to better understand which are the most important structural differences. The analysis carried out, with the work-flow that will be explained in the next paragraphs, allowed acquisition of useful information that will be collected in a scientific work.
Due to the high interest of the scientific community on this target, we have already produced a manuscript entitled “Computational Studies of SARS-Covid2 3CLpro: Insights from MD Simulations”, that will be submitted to the International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis” in a few days. In this paper we discuss the main differences coming from the analysis of the whole protein structural behavior and those seen in the binding site. The web address of the repository will be communicated shortly. It will be possible to download the trajectories of the simulations discussed in the work.
Papain-like proteinase (PL-PRO): Responsible for the cleavages located at the N-terminus of the replicase polyprotein. In addition, PL-PRO possesses a deubiquitinating/deISGylating activity and processes both 'Lys-48'- and 'Lys-63'-linked polyubiquitin chains from cellular substrates. Participates, together with nsp4, in the assembly of virally induced cytoplasmic double-membrane vesicles necessary for viral replication. In the video, the PL-PRO is shown in green highlight.
Nsp12-7-8: A key component, RNA-dependent RNA polymerase [RdRp, also known as nsp12], catalyzes the synthesis of viral RNA, and therefore plays a central role in the replication and transcription cycle of the COVID-19 virus, possibly with the help of nsp7 and nsp8 as cofactors. In the video, the nsp12/nsp7/nsp8 hetero-oligomeric complex is shown in highlight. Nsp12, nsp7 and nsp8 are shown in blue, red and green respectively.
<Nsp13: Scientists suspect that nsp13 unwinds so that other proteins can read its sequence and make new copies. This protein, called Helicase, is a multi-functional protein with a zinc-binding domain in the N-terminus displaying RNA and DNA duplex-unwinding activities with 5' to 3' polarity. Activity of helicase is dependent on magnesium. Here, the protein is reported in blue highlight.
<Nsp15: This enzyme is a specific endoribonuclease with a C-terminal catalytic domain, belonging to the EndoU family. EndoU enzymes are present in all animal kingdoms, where they perform various biological functions associated with RNA processing. Researchers suspect that this protein cuts the residual virus RNA as a way of hiding from the antiviral defenses of the infected cell. The protein, in its hexameric form, is shown in the video shows in highlight style, and each monomer composing the hexamer has a different color.
Spike receptor-binding domain (RBD)/ACE2: Dynamic structure of the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2 bound to the cell receptor ACE2. Coronaviruses use the spike glycoprotein on the envelope to bind to their cellular receptors. Such binding triggers a cascade of events that leads to the fusion between cell and viral membranes for cell entry. The video shows in highlight, the SARS-CoV-2 RBD core in slate and ACE2 in red.
CADD
A few more details
Such an approach is implemented by the Pockets2.0 plugin for the VEGA suite of programs. The VEGA suite comprises a graphical interface with a new version of the plug-in for FPocket (named Pockets2.0), a well-known software used to detect protein cavities, based on an optimized algorithm for Voronoi tessellation. For a better exploration of the protein cavities, this combines the already implemented cavity mapping, as performed by Fpocket, with docking calculations with probe molecule(s) using AutoDock/Vina or PLANTS docking programs. To optimize the ranking of the explored cavities, Pockets2.0 can utilize both Fpocket and docking scores by calculating customizable consensus scores. The combination of the FPocket and docking scores by calculating customizable consensus scores leads to a significant increase of the correctly identified binding sites compared to the FPocket and docking scores alone, and this enhancement appears to be truly relevant when analyzing complex proteins with rather narrow binding pockets, and in particular, for characterizing allosteric binding sites.
The scientific paper entitled “A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins by combining pocket and docking searches as implemented in Pockets2.0”has been submitted to International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis”. The web address of the repository will be communicated shortly. It will be possible to download all the structural data discussed in the work.
The images below show the process of the viral protein mapping (Figure 1) and the identification of the binding pocket (Figure 2). In particular, figures 1 and 2 represent the homologic 3D structures of the viral protein nsp13 and the heteromer nsp14-nsp10 respectively, generated in the deliverable D1.1. This process underlines the importance of collaboration among the activities carried out by each partner.
CADD
A few more details
The SARS CoV-2 3CL-Protease was analyzed and used to fine tune the virtual screening strategies. SARS CoV 3CL-Protease known inhibitors were collected from literature to define training sets for our studies, keeping in mind that the pocket of SARS-Cov and SARS-Cov-2 proteases are very similar. Different docking algorithms were combined to increase the quality of the virtual screening campaign.
Five consortium partners (Dompé, University of Milan, and the three supercomputing centers, CINECA, Barcelona Supercomputing Center (BSC) and Jeulich Forschungszentrum (FZJ)), made their hardware and software facilities available to run docking simulations, including LiGen, PLANTS, PELE, Glide and FRED. The virtual screening protocols were validated in terms of Enrichment Factor, setting the conditions that allow most of the training set compounds to be ranked in the top-scored positions. Subsequently, the molecules with the best docking scores, found in the top 1% of the screened library, were selected with each software, and a consensus approach of the different docking scores was used to rank the most promising safe in man drugs acting on 3CL-PRO viral protein (Figure 1). To further improve the quality of the results, more accurate methods were applied, such as induced-fit simulations on SARS-CoV known inhibitors with PELE software, and a regression analysis model to correlate predicted and in vitro activities was built. We obtained a clear correlation of higher IC50 with larger residence time in the active site, which was analyzed in terms of (1) the solvent-accessible surface area of the ligands, (2) the sub-pocket population analysis. Both analyses showed high potential for a more accurate prediction, opening the possibilities, for example, to a second refinement effort after an initial docking campaign.
We are now using the tuned and validated virtual screening protocols to screen the drug repurposing library against the best selected 3D structures of the other SARS-Cov-2 proteins, either already experimentally obtained (PDB) or derived by homology modelling techniques, and processed by molecular dynamics simulation (Figure 2).
We currently have validation compounds coming from the literature, and results coming from our docking approaches that identified different sets of molecules. We obtained a critical assessment of different approaches and results that we can get from combining the best of the techniques we have available and that we are putting in place, for the identification of the most promising safe in man drugs that are ready for immediate treatment of the infected population.
PROTEIN PRODUCTION AND TARGET BASED ASSAYS
PROTEIN PRODUCTION AND TARGET BASED ASSAYS
PHENOTYPIC SCREEN
STRUCTURAL BIOLOGY
A few more details
The structural biology team of Elettra Sincrotrone Trieste involved in the WP4, started to work in April 2020, anticipating the planned activities by about 5 months with respect to the original schedules. The initial effort has been dedicated to protein expression, purification and sample characterization with crystallization screening set-up.
The SARS-CoV-2 Mpro was successfully expressed in E.coli and purified at homogeneity (> 98% purity) starting from the expression vector kindly donated by L. Hilgenfeld (Lubeck University – DE). Initial crystallization screenings have been set up based on published data and using commercially available crystallization kits. The most promising conditions have been optimized and “flower-like” shape crystals were reproducibly obtained. These crystals were cryo-preserved and tested at the XRD2 beamline of the Elettra synchrotron, showing a diffraction pattern with a resolution in the range of 1.6-2.0 Å, with the best resolution achieved at 1.52Å. Datasets were processed and a quick MR solution was done using pdb 6W63 as a starting model showing either P21 or C2 space group, having a monomer or a dimer in the asymmetric unit, respectively, but with almost identical conformation. Further optimization of the crystallization trials has established also a seeding based protocol that grew crystals of space group P212121. For each space group we obtained the crystal structure of Mpro in APO form and the coordinates have been deposited in PDB with code 7ALI (https://www.rcsb.org/structure/7ALI, sp.gr. P21), 7ALH (www.rcsb.org/structure/7ALH, sp.gr. C2) and 7BBE (www.rcsb.org/structure/7BB2, sp.gr. P2(1)2(1)2(1)).
Xray diffraction studies on crystals of Mpro bound to specific inhibitors have been performed. A number of small molecules selected from a biochemical screening of 8700 repurposing drugs have been tested by co-crystallization and soaking techniques. So far 8 structures have been deposited in PDB describing the binding modes of myricetin, Ebselen, SU3327, MG-132 in absence and in presence of DTT. Binding mode of myricetin, a flavonoid, has been clearly identified in the crystal structure (Fig. 4) showing a covalent linkage with the catalytic cys145. The results have been published in two papers confirming the findings of the biochemical screening [Kuzikov et al. 2021; doi.org/10.1021/acsptsci.0c00216] and validating the outputs of virtual screening and molecular dinamics [Gossen et al. 2021; https://doi.org/10.1021/acsptsci.0c00215]. The coordinates of the crystal model are deposited in PDB with code 7B3E (www.rcsb.org/structure/7B3E). All structures in PDB have been released immediately and are openly available.
Deposited PDB:7ALH, 7AL, 7B3E, 7BB2, 7BE7, 7BFB, 7BGP, 7NBY, 7NF5, 7NG3, 7NG6
GENOMICS
A few more details
Here, we aim at analyzing the effect of evolutionary constraints in shaping the organization of the spike protein family across the β-CoV lineage. To this aim, we use a multivariate analysis protocol for unsupervised detection of both the protein family segregation and the residue positions that better explain the sources of variation of the family, i.e. Specificity Determining Positions (SDPs).
The analysis was performed both at the level of the full β-CoV genus and individual subgroups. While in the first case the results are consistent with the known phylogenetic classification of Betacoronavirus into five subgenera (Fig. 1C), the analysis performed on the individual β-CoV groups allowed a fine-grained separation into clusters that clearly reflect the functional diversification of the spike protein family, that is, the specificity to different host-cell receptors. Indeed, both the clustering and domain enrichment results of the identified SDPs consistently reproduce the known cell receptor specificities observed across the different β-CoV lineages. At the level of the Sarbecovirus group, for example, both SARS-CoV-2 and RaTG13 are clustered together with SARS-CoV and other SARS-like sequences from bats (Fig. 1A,D), reflecting the ability of these members of the Sarbecovirus group to bind the ACE2 cell receptor. Notably, the proximity of SARS-CoV-2 and RaTG13 and other SARS-CoV sequences based on the SDPs is different to what observed based on phylogenetic analysis, where they form distinct clades.
Collectively, our results point to a key role of the SDPs in mediating host cell receptor specificity across β-CoVs and provide, at the same time, a framework for monitoring the evolution of the SARS-CoV-2 specificity to hACE2, as well as the emergence of novel potential cross-species transmission events. As such, it is worth mentioning that from the analysis of amino acid variations across the circulating SARS-CoV-2 virus, SDPs found in the RBD tend to mutate with a very low frequency, similar to what is seen at ACE2 contacting sites. This is of relevance, as our results suggest that mutations in SDPs can significantly impact the receptor-binding ability of the spike. In line with this reasoning, it is important to notice that other members of the Sarbecovirus group might have the potential to acquire ACE2 binding ability, as they share substantial similarity in terms of SPDs. This is especially the case of members of the Sarbecovirus clade 3, which despite being phylogenetically distant to SARS-CoV-2, display identical residues in 10 out of 14 SDP positions within the RBD, making them potential candidates for new human infections.
In summary, our results show that the identification of evolutionary patterns based on the analysis of sequence information alone can provide meaningful insights on the molecular basis of host-pathogen interactions and adaptation. Both the methodology and results obtained in this work lead the basis for a follow-up study where, in collaboration with other members of the E4C consortium, we aim at identifying potential routes of mutations that could lead to new β-CoVs adaptation to human hosts, and ultimately contribute to better understanding and monitoring of events that are critical to public health concerns worldwide.
1Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J. 2021;19:759-766. doi: 10.1016/j.csbj.2021.01.006
In parallel, continuous monitoring of the virus evolution is performed based on the analysis of the SARS-CoV-2 genomic sequences from public repositories. Variation analysis is performed based on the established SARS-CoV-2 NC_045512.17 as a reference genome, where amino acid variations are annotated for all major coding regions as well as alternative reading frames and mutation frequencies computed globally and by different countries at different time points. Data are continuously updated as new sequences become available and shared with the partners and the community via the ViralSeq website
MECHANISM OF ACTION
AUTOMATED HOMOLOGY MODELING WORKFLOW FOR DRUG TARGET
PRODUCTION AND TUNING ON HPC INFRASTRUCTURE
A few more details
To promote agile and portable software development needed to guarantee continuous releases of updated functionalities and improved elaborated throughput of the EXSCALATE software platform, we encapsulated all the geometrical docking, issued in a stand-alone library called LiGen GeoDock, with a simple, stable, and well-defined non-virtual interface.
Figure 1 shows an overview of the LiGen GeoDock library, which aims at docking a ligand in a target pocket using geometrical information only. To better encapsulate Geodock in the EXSCALATE platform, we designed a single public interface hiding implementation details. In this way, we can improve GeoDock without hindering the development of the other components. The E4C project targets a heterogeneous node composed of GPUs and CPUs. The previous C++ code version was unable to harness the full computational power of the new CINECA Marconi100 node based on CPUS only. To overcome this limitation, we implemented the docking algorithm using the OpenACC pragma-based language and the PGI OpenACC compiler to generate device-specific code while maximizing the code portability. However, given the transition to the Marconi100 supercomputing cluster at CINECA, we then implemented the docking algorithm using the CUDA C/C++ language to obtain the maximum performance from of the NVIDIA Tesla V100 GPUs.
We ran an initial experimental campaign to assess the ligand library performance with a wide range in the number of atoms and rotatable bonds. The code peak performance using only the 2xIBM-Power9 and 128 software threads reaches a throughput of 30K optimal ligand poses per second, while the exploitation of the full node including the 4xNvidiaV100 GPUs reaches a throughput of more than 250K optimal ligand poses per second.
These performance results obtained so far on the new CINECA Marconi100 partition are very promising, and they are a good starting point for further tuning of the LiGen GeoDock library. As next steps, we envision pushing further on a fine-grain refinement and adopting dynamic autotuning approaches, combined with the entire machine-level scaling.
REGULATORY CONTRACTS
DISSEMINATION & EXPLOITATION OF THE FOREGROUND
DISSEMINATION & EXPLOITATION OF THE FOREGROUND