CADD

SIB Swiss Institute of Bioinformatics

Summary

Generation of homology models based on the mapped functional proteins of the entire sequenced SARS-CoV-2 viral genome

The sequences of mature proteins were determined from the genome and annotations from UniProt.
The SWISS-MODEL platform was used to generate homology models. Possible heteromeric complexes were predicted and modeled as well. The resulting models, as well as experimentally determined structures deposited in the PDB for the SARS-CoV-2 proteins, are available on a dedicated page of the SWISS-MODEL server. The page is updated on a weekly basis with the latest structures from the PDB and improved models.

STRUCTURAL BIOLOGY

ELETTRA SINCROTRONE TRIESTE

Summary

The structural biology team of Elettra Sincrotrone Trieste has achieved first results on SARS-CoV-2 Mpro viral protein by setting up a reproducible expression and purification protocol and defining biophysical parameters for protein quality control and comparison among different protein batches. Several protein crystals of the APO protein form have been obtained, and the optimization process is ongoing, with best crystals diffracting at a resolution range of 1.6-2.0Å. Data set have been collected and processed resulting to have a single monomer or a dimer in the asymmetric unit with conformation almost identical. Co-crystallization experiments with compounds are progressing we aim to obtain first data of the protein bound to inhibitors in few months.
Figure 1 shows examples of Mpro crystals and diffraction patterns, Figures 2 and 3 show the graphic representations of the Mpro protein.

A few more details



Mpro crystals and diffraction patterns
Figure 1: Mpro Crystals and Diffraction Patterns


graphic representations of the Mpro protein
Figure 2: Graphic Representations of the Mpro Protein


graphic representations of the Mpro protein
Figure 3: Graphic Representations of the Mpro Protein



The structural biology team of Elettra Sincrotrone Trieste started to work on activities of WP4, anticipating the planned activities by about 5 months with respect to the original scheduled timelines. In this initial period, the focus was on protein sample preparation and crystallization set-up.
The SARS-CoV-2 Mpro was successfully expressed in E.coli and purified at homogeneity (> 98% purity) starting from the expression vector kindly donated by L. Hilgenfeld (Lubeck University – DE).
This protein batch was compared to another protein batch derived from an external partner. Both proteins were analyzed in a thermal stability assay using different buffers, and demonstrated identical behaviour.
Both proteins were used in subsequent crystallization protocols. Initial crystallization screenings have been set up based on published data and using commercially available crystallization kits. The most promising conditions have been optimized and “flower-like” shape crystals were reproducibly obtained. These crystals were cryo-preserved and tested at the XRD2 beamline of the Elettra synchrotron, showing a diffraction pattern with a resolution in the range of 1.6-2.0 Å, with the best resolution achieved at 1.52Å. Datasets were processed and a quick MR solution was done using pdb 6W63 as a starting model showing a monomer or a dimer in the asymmetric unit with almost identical conformation. The optimization process of co-crystallization trials with selected compounds is ongoing.

PRODUCTION AND TUNING ON HPC INFRASTRUCTURE

POLIMI, CINECA

Summary

In the WP8 “Production and tuning on HPC infrastructure”, the POLIMI team is responsible for the continuous tuning and code adaptation of the EXSCALATE platform, and contributes to its porting on the new Marconi100 partition at CINECA.
The first month of the E4C project coincided with the initial production period of the new CINECA machine. POLIMI ported a first version of the docking library to exploit the computing capability of the heterogeneous computing node composed of 2xIBM-Power9 Sockets and 4xNVidiaV100.
During the experimental campaign, we reached a throughput of more than 250K optimal ligand poses per second on a single node of the Marconi100 machine. Porting and tuning on multiple nodes is currently ongoing.

A few more details

To promote agile and portable software development needed to guarantee continuous releases of updated functionalities and improved elaborated throughput of the EXSCALATE software platform, we encapsulated all the geometrical docking, issued in a stand-alone library called LiGen GeoDock, with a simple, stable, and well-defined non-virtual interface. Figure 1 shows an overview of the LiGen GeoDock library, which aims at docking a ligand in a target pocket using geometrical information only. To better encapsulate Geodock in the EXSCALATE platform, we designed a single public interface hiding implementation details. In this way, we can improve GeoDock without hindering the development of the other components. The E4C project targets a heterogeneous node composed of GPUs and CPUs. The previous C++ code version was unable to harness the full computational power of the new CINECA Marconi100 node based on CPUS only. To overcome this limitation, we implemented the docking algorithm using the OpenACC pragma-based language and the PGI OpenACC compiler to generate device-specific code while maximizing the code portability. However, given the transition to the Marconi100 supercomputing cluster at CINECA, we then implemented the docking algorithm using the CUDA C/C++ language to obtain the maximum performance from of the NVIDIA Tesla V100 GPUs.

We ran an initial experimental campaign to assess the ligand library performance with a wide range in the number of atoms and rotatable bonds. The code peak performance using only the 2xIBM-Power9 and 128 software threads reaches a throughput of 30K optimal ligand poses per second, while the exploitation of the full node including the 4xNvidiaV100 GPUs reaches a throughput of more than 250K optimal ligand poses per second.

These performance results obtained so far on the new CINECA Marconi100 partition are very promising, and they are a good starting point for further tuning of the LiGen GeoDock library. As next steps, we envision pushing further on a fine-grain refinement and adopting dynamic autotuning approaches, combined with the entire machine-level scaling.

Overview of the LiGen geometrical docking library deployed on a Marconi M100 computation node at CINECA supercomputing center
Figure 1: Overview of the LiGen geometrical docking library deployed on a Marconi M100 computation node at CINECA supercomputing center.

CADD

DOMPÉ, CINECA, KTH, ENI

Summary

MD simulations of the HM generated, and on the 3D experimental structure deposited in the Protein Data Bank (D1.2).

We proceeded to simulate MD simulations of the homology models generated, and on the 3D experimental structure deposited in the Protein Data Bank. The production run was performed to generate at least 1 μs (1 microsecond) trajectory with a total of 20,000 collected structures for each simulated system. The viral protein dataset, selected for MD simulation studies, contains Active Interest Proteins and Low-Interest Proteins in their in apo form. To select the most useful protein conformation form MD, a post HPC-run analysis was performed by using different clustering methods.

A few more details

Reported below is the dataset containing viral proteins, selected for MD simulation studies, and that contains Active Interest Proteins and Low-Interest Proteins:

M-Protein N-Protein
Nsp2 Nsp3
Nsp4 Nsp5 - 3CL-PRO
Nsp6 Nsp7-Nsp8 - HETEROMER
Nsp9 Nsp12 - MONOMER
Nsp12-Nsp7-Nsp8 - HETEROMER Nsp13 - HELICASE
Nsp14 - MONOMER Nsp10-Nsp14 - HETEROMER
Nsp15 Nsp16 - MONOMER
Nsp10-Nsp16 - HETEROMER ORF3a
ORF6 ORF7a
ORF8 ORF10
PL-PRO Spike-ACE2
Spike  

Among these 25 unique structures, our studies have used both Homology and Experimental models, by increasing the number of overall structures to 37. All the MD simulations, carried out on HPC5 and Galileo clusters, yielded by ENI and CINECA respectively, are ongoing, and 27 systems already reached at least 1 microsecond, and among these, some have reached or exceeded 2 microseconds. We are pushing the simulation times towards 10 micro-seconds. In particular, the 3CL-PRO was simulated in its DIMERIC and MONOMERIC forms, to better understand which are the most important structural differences. The analysis carried out, with the work-flow that will be explained in the next paragraphs, allowed acquisition of useful information that will be collected in a scientific work.

Due to the high interest of the scientific community on this target, we have already produced a manuscript entitled “Computational Studies of SARS-Covid2 3CLpro: Insights from MD Simulations”, that will be submitted to the International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis” in a few days. In this paper we discuss the main differences coming from the analysis of the whole protein structural behavior and those seen in the binding site. The web address of the repository will be communicated shortly. It will be possible to download the trajectories of the simulations discussed in the work.

Papain-like proteinase (PL-PRO): Responsible for the cleavages located at the N-terminus of the replicase polyprotein. In addition, PL-PRO possesses a deubiquitinating/deISGylating activity and processes both 'Lys-48'- and 'Lys-63'-linked polyubiquitin chains from cellular substrates. Participates, together with nsp4, in the assembly of virally induced cytoplasmic double-membrane vesicles necessary for viral replication. In the video, the PL-PRO is shown in green highlight.




Nsp12-7-8: A key component, RNA-dependent RNA polymerase [RdRp, also known as nsp12], catalyzes the synthesis of viral RNA, and therefore plays a central role in the replication and transcription cycle of the COVID-19 virus, possibly with the help of nsp7 and nsp8 as cofactors. In the video, the nsp12/nsp7/nsp8 hetero-oligomeric complex is shown in highlight. Nsp12, nsp7 and nsp8 are shown in blue, red and green respectively.

<


Nsp13: Scientists suspect that nsp13 unwinds so that other proteins can read its sequence and make new copies. This protein, called Helicase, is a multi-functional protein with a zinc-binding domain in the N-terminus displaying RNA and DNA duplex-unwinding activities with 5' to 3' polarity. Activity of helicase is dependent on magnesium. Here, the protein is reported in blue highlight.

<


Nsp15: This enzyme is a specific endoribonuclease with a C-terminal catalytic domain, belonging to the EndoU family. EndoU enzymes are present in all animal kingdoms, where they perform various biological functions associated with RNA processing. Researchers suspect that this protein cuts the residual virus RNA as a way of hiding from the antiviral defenses of the infected cell. The protein, in its hexameric form, is shown in the video shows in highlight style, and each monomer composing the hexamer has a different color.




Spike receptor-binding domain (RBD)/ACE2: Dynamic structure of the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2 bound to the cell receptor ACE2. Coronaviruses use the spike glycoprotein on the envelope to bind to their cellular receptors. Such binding triggers a cascade of events that leads to the fusion between cell and viral membranes for cell entry. The video shows in highlight, the SARS-CoV-2 RBD core in slate and ACE2 in red.


CADD

UNIVERSITY OF MILAN, DOMPÉ, SIB

Summary

A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins.

This study provided a novel strategy for pocket-mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines). Such a mapping enables the identification of the most relevant binding sites for which virtual screening simulations or de novo rational design should allow the identification of promising hits.

A few more details

Such an approach is implemented by the Pockets2.0 plugin for the VEGA suite of programs. The VEGA suite comprises a graphical interface with a new version of the plug-in for FPocket (named Pockets2.0), a well-known software used to detect protein cavities, based on an optimized algorithm for Voronoi tessellation. For a better exploration of the protein cavities, this combines the already implemented cavity mapping, as performed by Fpocket, with docking calculations with probe molecule(s) using AutoDock/Vina or PLANTS docking programs. To optimize the ranking of the explored cavities, Pockets2.0 can utilize both Fpocket and docking scores by calculating customizable consensus scores. The combination of the FPocket and docking scores by calculating customizable consensus scores leads to a significant increase of the correctly identified binding sites compared to the FPocket and docking scores alone, and this enhancement appears to be truly relevant when analyzing complex proteins with rather narrow binding pockets, and in particular, for characterizing allosteric binding sites.

The scientific paper entitled “A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins by combining pocket and docking searches as implemented in Pockets2.0”has been submitted to International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis”. The web address of the repository will be communicated shortly. It will be possible to download all the structural data discussed in the work.

The images below show the process of the viral protein mapping (Figure 1) and the identification of the binding pocket (Figure 2). In particular, figures 1 and 2 represent the homologic 3D structures of the viral protein nsp13 and the heteromer nsp14-nsp10 respectively, generated in the deliverable D1.1. This process underlines the importance of collaboration among the activities carried out by each partner.

Viral Protein Mapping
Figure 1: Viral Protein Mapping
Identification of the Binding Pocket
Figure 2: Identification of the Binding Pocket

CADD

DOMPÉ, UNIVERSITY OF MILAN, BSC, FZJ, CINECA

Summary

Machine Learning and Virtual Screening (VS) protocol optimization and Drug Repositioning (D1.3, D1.4, D1.5).

Several docking simulations were performed to define and optimize the machine learning and virtual screening protocols to use on SARS CoV-2 proteins. The performances of the virtual screening strategies were assessed by evaluating their capacity to correctly rank molecules, which are endowed with antiviral activity, and in particular, with a known effect against SARS CoV proteins, considering the lack of known actives on SARS CoV-2 proteins.
The tuned and validated virtual screening protocols were used to screen a repurposing library, containing the set of safe in man drugs, commercialized or under active development in clinical phases, and a set of known bioactives in particular preclinical compounds identified as “CoV Inhibitors” (> 12000 drugs). The most promising drugs, identified from docking studies as potentially active against SARS-Cov-2 proteins, were selected to be tested in biochemical and phenotypic assays performed respectively in WP2 and WP3.

A few more details

The SARS CoV-2 3CL-Protease was analyzed and used to fine tune the virtual screening strategies. SARS CoV 3CL-Protease known inhibitors were collected from literature to define training sets for our studies, keeping in mind that the pocket of SARS-Cov and SARS-Cov-2 proteases are very similar. Different docking algorithms were combined to increase the quality of the virtual screening campaign.

Five consortium partners (Dompé, University of Milan, and the three supercomputing centers, CINECA, Barcelona Supercomputing Center (BSC) and Jeulich Forschungszentrum (FZJ)), made their hardware and software facilities available to run docking simulations, including LiGen, PLANTS, PELE, Glide and FRED. The virtual screening protocols were validated in terms of Enrichment Factor, setting the conditions that allow most of the training set compounds to be ranked in the top-scored positions. Subsequently, the molecules with the best docking scores, found in the top 1% of the screened library, were selected with each software, and a consensus approach of the different docking scores was used to rank the most promising safe in man drugs acting on 3CL-PRO viral protein (Figure 1). To further improve the quality of the results, more accurate methods were applied, such as induced-fit simulations on SARS-CoV known inhibitors with PELE software, and a regression analysis model to correlate predicted and in vitro activities was built. We obtained a clear correlation of higher IC50 with larger residence time in the active site, which was analyzed in terms of (1) the solvent-accessible surface area of the ligands, (2) the sub-pocket population analysis. Both analyses showed high potential for a more accurate prediction, opening the possibilities, for example, to a second refinement effort after an initial docking campaign.

We are now using the tuned and validated virtual screening protocols to screen the drug repurposing library against the best selected 3D structures of the other SARS-Cov-2 proteins, either already experimentally obtained (PDB) or derived by homology modelling techniques, and processed by molecular dynamics simulation (Figure 2).

We currently have validation compounds coming from the literature, and results coming from our docking approaches that identified different sets of molecules. We obtained a critical assessment of different approaches and results that we can get from combining the best of the techniques we have available and that we are putting in place, for the identification of the most promising safe in man drugs that are ready for immediate treatment of the infected population.

Viral Protein Mapping
Figure 1: Virtual screening protocol applied to 3CL-PRO main protease.


Identification of the Binding Pocket
Figure 2 Validated virtual screening protocol deployed on other viral proteins

DISSEMINATION & EXPLOITATION OF THE FOREGROUND

CHELONIA

Summary

Setup and Maintenance of Website & Social Media (D10.1).

Chelonia leads the coordination of dissemination and communication activities. The project webpage www.exscalate4cov.eu was produced within the first month of the project and is now regularly updated and maintained. Communication activities ensure the diffusion of the project outcome outside the Consortium so that the methodologies developed and the results obtained can be widely disseminated in the scientific community.
Communication objectives are achieved by using multiple different strategies including the use of social networks such as Linkedin, Facebook and Twitter to spread the project knowledge to a larger audience.
Dedicated exscalate4cov pages have been created and all communication are mainly tagged #exscalate4cov. Given the wide social impact of this research, actions towards the general press are given through press releases, newsletters and through local press.

DISSEMINATION & EXPLOITATION OF THE FOREGROUND

CHELONIA

Summary

Initial Dissemination Activities and Plans (D10.2).

This task defines the strategy for exploitation of the project results. The activity, led by CHELONIA, in tight collaboration with DOMPE, analyzes the strategies and implements the actions to maximize the utilization of the project results, beyond the project partners and time-frame. A dissemination plan was issued to all partners 3 months earlier than the expected deadline in order to facilitate dissemination and to align partners in communicating projects results to the target audience. A first webinar was organized on April 28th, 2020, thanks to the support of CECAM (Centre Européen de Calcul Atomique et Moléculaire – www.cecam.org ), which is an organization devoted to the promotion of fundamental research on advanced computational methods and to their application to important problems in frontier areas of science and technology. To access the recorded webinar please visit YOUTUBE

CADD

SIB Swiss Institute of Bioinformatics

Summary

Generation of homology models based on the mapped functional proteins of the entire sequenced SARS-CoV-2 viral genome

The sequences of mature proteins were determined from the genome and annotations from UniProt.
The SWISS-MODEL platform was used to generate homology models. Possible heteromeric complexes were predicted and modeled as well. The resulting models, as well as experimentally determined structures deposited in the PDB for the SARS-CoV-2 proteins, are available on a dedicated page of the SWISS-MODEL server. The page is updated on a weekly basis with the latest structures from the PDB and improved models.

CADD

DOMPÉ, CINECA, KTH, ENI

Summary

MD simulations of the HM generated, and on the 3D experimental structure deposited in the Protein Data Bank (D1.2).

We proceeded to simulate MD simulations of the homology models generated, and on the 3D experimental structure deposited in the Protein Data Bank. The production run was performed to generate at least 1 μs (1 microsecond) trajectory with a total of 20,000 collected structures for each simulated system. The viral protein dataset, selected for MD simulation studies, contains Active Interest Proteins and Low-Interest Proteins in their in apo form. To select the most useful protein conformation form MD, a post HPC-run analysis was performed by using different clustering methods.

A few more details

Reported below is the dataset containing viral proteins, selected for MD simulation studies, and that contains Active Interest Proteins and Low-Interest Proteins:

M-Protein N-Protein
Nsp2 Nsp3
Nsp4 Nsp5 - 3CL-PRO
Nsp6 Nsp7-Nsp8 - HETEROMER
Nsp9 Nsp12 - MONOMER
Nsp12-Nsp7-Nsp8 - HETEROMER Nsp13 - HELICASE
Nsp14 - MONOMER Nsp10-Nsp14 - HETEROMER
Nsp15 Nsp16 - MONOMER
Nsp10-Nsp16 - HETEROMER ORF3a
ORF6 ORF7a
ORF8 ORF10
PL-PRO Spike-ACE2
Spike  

Among these 25 unique structures, our studies have used both Homology and Experimental models, by increasing the number of overall structures to 37. All the MD simulations, carried out on HPC5 and Galileo clusters, yielded by ENI and CINECA respectively, are ongoing, and 27 systems already reached at least 1 microsecond, and among these, some have reached or exceeded 2 microseconds. We are pushing the simulation times towards 10 micro-seconds. In particular, the 3CL-PRO was simulated in its DIMERIC and MONOMERIC forms, to better understand which are the most important structural differences. The analysis carried out, with the work-flow that will be explained in the next paragraphs, allowed acquisition of useful information that will be collected in a scientific work.

Due to the high interest of the scientific community on this target, we have already produced a manuscript entitled “Computational Studies of SARS-Covid2 3CLpro: Insights from MD Simulations”, that will be submitted to the International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis” in a few days. In this paper we discuss the main differences coming from the analysis of the whole protein structural behavior and those seen in the binding site. The web address of the repository will be communicated shortly. It will be possible to download the trajectories of the simulations discussed in the work.

Papain-like proteinase (PL-PRO): Responsible for the cleavages located at the N-terminus of the replicase polyprotein. In addition, PL-PRO possesses a deubiquitinating/deISGylating activity and processes both 'Lys-48'- and 'Lys-63'-linked polyubiquitin chains from cellular substrates. Participates, together with nsp4, in the assembly of virally induced cytoplasmic double-membrane vesicles necessary for viral replication. In the video, the PL-PRO is shown in green highlight.




Nsp12-7-8: A key component, RNA-dependent RNA polymerase [RdRp, also known as nsp12], catalyzes the synthesis of viral RNA, and therefore plays a central role in the replication and transcription cycle of the COVID-19 virus, possibly with the help of nsp7 and nsp8 as cofactors. In the video, the nsp12/nsp7/nsp8 hetero-oligomeric complex is shown in highlight. Nsp12, nsp7 and nsp8 are shown in blue, red and green respectively.

<


Nsp13: Scientists suspect that nsp13 unwinds so that other proteins can read its sequence and make new copies. This protein, called Helicase, is a multi-functional protein with a zinc-binding domain in the N-terminus displaying RNA and DNA duplex-unwinding activities with 5' to 3' polarity. Activity of helicase is dependent on magnesium. Here, the protein is reported in blue highlight.

<


Nsp15: This enzyme is a specific endoribonuclease with a C-terminal catalytic domain, belonging to the EndoU family. EndoU enzymes are present in all animal kingdoms, where they perform various biological functions associated with RNA processing. Researchers suspect that this protein cuts the residual virus RNA as a way of hiding from the antiviral defenses of the infected cell. The protein, in its hexameric form, is shown in the video shows in highlight style, and each monomer composing the hexamer has a different color.




Spike receptor-binding domain (RBD)/ACE2: Dynamic structure of the receptor-binding domain (RBD) of the spike protein of SARS-CoV-2 bound to the cell receptor ACE2. Coronaviruses use the spike glycoprotein on the envelope to bind to their cellular receptors. Such binding triggers a cascade of events that leads to the fusion between cell and viral membranes for cell entry. The video shows in highlight, the SARS-CoV-2 RBD core in slate and ACE2 in red.


CADD

UNIVERSITY OF MILAN, DOMPÉ, SIB

Summary

A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins.

This study provided a novel strategy for pocket-mapping based on the combination of pocket (as performed by the well-known FPocket tool) and docking searches (as performed by PLANTS or AutoDock/Vina engines). Such a mapping enables the identification of the most relevant binding sites for which virtual screening simulations or de novo rational design should allow the identification of promising hits.

A few more details

Such an approach is implemented by the Pockets2.0 plugin for the VEGA suite of programs. The VEGA suite comprises a graphical interface with a new version of the plug-in for FPocket (named Pockets2.0), a well-known software used to detect protein cavities, based on an optimized algorithm for Voronoi tessellation. For a better exploration of the protein cavities, this combines the already implemented cavity mapping, as performed by Fpocket, with docking calculations with probe molecule(s) using AutoDock/Vina or PLANTS docking programs. To optimize the ranking of the explored cavities, Pockets2.0 can utilize both Fpocket and docking scores by calculating customizable consensus scores. The combination of the FPocket and docking scores by calculating customizable consensus scores leads to a significant increase of the correctly identified binding sites compared to the FPocket and docking scores alone, and this enhancement appears to be truly relevant when analyzing complex proteins with rather narrow binding pockets, and in particular, for characterizing allosteric binding sites.

The scientific paper entitled “A systematic mapping of the druggable cavities within the SARS CoV-2 therapeutically relevant proteins by combining pocket and docking searches as implemented in Pockets2.0”has been submitted to International Journal of Molecular Sciences - Special Issue “Exscalate4CoV: Innovative High Performing Computing (HPC) Strategies to Tackle Pandemic Crisis”. The web address of the repository will be communicated shortly. It will be possible to download all the structural data discussed in the work.

The images below show the process of the viral protein mapping (Figure 1) and the identification of the binding pocket (Figure 2). In particular, figures 1 and 2 represent the homologic 3D structures of the viral protein nsp13 and the heteromer nsp14-nsp10 respectively, generated in the deliverable D1.1. This process underlines the importance of collaboration among the activities carried out by each partner.

Viral Protein Mapping
Figure 1: Viral Protein Mapping
Identification of the Binding Pocket
Figure 2: Identification of the Binding Pocket

CADD

DOMPÉ, UNIVERSITY OF MILAN, BSC, FZJ, CINECA

Summary

Machine Learning and Virtual Screening (VS) protocol optimization and Drug Repositioning (D1.3, D1.4, D1.5).

Several docking simulations were performed to define and optimize the machine learning and virtual screening protocols to use on SARS CoV-2 proteins. The performances of the virtual screening strategies were assessed by evaluating their capacity to correctly rank molecules, which are endowed with antiviral activity, and in particular, with a known effect against SARS CoV proteins, considering the lack of known actives on SARS CoV-2 proteins.
The tuned and validated virtual screening protocols were used to screen a repurposing library, containing the set of safe in man drugs, commercialized or under active development in clinical phases, and a set of known bioactives in particular preclinical compounds identified as “CoV Inhibitors” (> 12000 drugs). The most promising drugs, identified from docking studies as potentially active against SARS-Cov-2 proteins, were selected to be tested in biochemical and phenotypic assays performed respectively in WP2 and WP3.

A few more details

The SARS CoV-2 3CL-Protease was analyzed and used to fine tune the virtual screening strategies. SARS CoV 3CL-Protease known inhibitors were collected from literature to define training sets for our studies, keeping in mind that the pocket of SARS-Cov and SARS-Cov-2 proteases are very similar. Different docking algorithms were combined to increase the quality of the virtual screening campaign.

Five consortium partners (Dompé, University of Milan, and the three supercomputing centers, CINECA, Barcelona Supercomputing Center (BSC) and Jeulich Forschungszentrum (FZJ)), made their hardware and software facilities available to run docking simulations, including LiGen, PLANTS, PELE, Glide and FRED. The virtual screening protocols were validated in terms of Enrichment Factor, setting the conditions that allow most of the training set compounds to be ranked in the top-scored positions. Subsequently, the molecules with the best docking scores, found in the top 1% of the screened library, were selected with each software, and a consensus approach of the different docking scores was used to rank the most promising safe in man drugs acting on 3CL-PRO viral protein (Figure 1). To further improve the quality of the results, more accurate methods were applied, such as induced-fit simulations on SARS-CoV known inhibitors with PELE software, and a regression analysis model to correlate predicted and in vitro activities was built. We obtained a clear correlation of higher IC50 with larger residence time in the active site, which was analyzed in terms of (1) the solvent-accessible surface area of the ligands, (2) the sub-pocket population analysis. Both analyses showed high potential for a more accurate prediction, opening the possibilities, for example, to a second refinement effort after an initial docking campaign.

We are now using the tuned and validated virtual screening protocols to screen the drug repurposing library against the best selected 3D structures of the other SARS-Cov-2 proteins, either already experimentally obtained (PDB) or derived by homology modelling techniques, and processed by molecular dynamics simulation (Figure 2).

We currently have validation compounds coming from the literature, and results coming from our docking approaches that identified different sets of molecules. We obtained a critical assessment of different approaches and results that we can get from combining the best of the techniques we have available and that we are putting in place, for the identification of the most promising safe in man drugs that are ready for immediate treatment of the infected population.

Viral Protein Mapping
Figure 1: Virtual screening protocol applied to 3CL-PRO main protease.


Identification of the Binding Pocket
Figure 2 Validated virtual screening protocol deployed on other viral proteins

PROTEIN PRODUCTION AND TARGET BASED ASSAYS

PROTEIN PRODUCTION AND TARGET BASED ASSAYS

No results yet

PHENOTYPIC SCREEN

PHENOTYPIC SCREEN

No results yet

STRUCTURAL BIOLOGY

ELETTRA SINCROTRONE TRIESTE

Summary

The structural biology team of Elettra Sincrotrone Trieste has achieved first results on SARS-CoV-2 Mpro viral protein by setting up a reproducible expression and purification protocol and defining biophysical parameters for protein quality control and comparison among different protein batches. Several protein crystals of the APO protein form have been obtained, and the optimization process is ongoing, with best crystals diffracting at a resolution range of 1.6-2.0Å. Data set have been collected and processed resulting to have a single monomer or a dimer in the asymmetric unit with conformation almost identical. Co-crystallization experiments with compounds are progressing we aim to obtain first data of the protein bound to inhibitors in few months.
Figure 1 shows examples of Mpro crystals and diffraction patterns, Figures 2 and 3 show the graphic representations of the Mpro protein.

A few more details

Mpro crystals and diffraction patterns
Figure 1: Mpro Crystals and Diffraction Patterns


graphic representations of the Mpro protein
Figure 2: Graphic Representations of the Mpro Protein


graphic representations of the Mpro protein
Figure 3: Graphic Representations of the Mpro Protein



The structural biology team of Elettra Sincrotrone Trieste started to work on activities of WP4, anticipating the planned activities by about 5 months with respect to the original scheduled timelines. In this initial period, the focus was on protein sample preparation and crystallization set-up.
The SARS-CoV-2 Mpro was successfully expressed in E.coli and purified at homogeneity (> 98% purity) starting from the expression vector kindly donated by L. Hilgenfeld (Lubeck University – DE).
This protein batch was compared to another protein batch derived from an external partner. Both proteins were analyzed in a thermal stability assay using different buffers, and demonstrated identical behaviour.
Both proteins were used in subsequent crystallization protocols. Initial crystallization screenings have been set up based on published data and using commercially available crystallization kits. The most promising conditions have been optimized and “flower-like” shape crystals were reproducibly obtained. These crystals were cryo-preserved and tested at the XRD2 beamline of the Elettra synchrotron, showing a diffraction pattern with a resolution in the range of 1.6-2.0 Å, with the best resolution achieved at 1.52Å. Datasets were processed and a quick MR solution was done using pdb 6W63 as a starting model showing a monomer or a dimer in the asymmetric unit with almost identical conformation. The optimization process of co-crystallization trials with selected compounds is ongoing.

GENOMICS

GENOMICS

No results yet

MECHANISM OF ACTION

MECHANISM OF ACTION

No results yet

AUTOMATED HOMOLOGY MODELING WORKFLOW FOR DRUG TARGET

AUTOMATED HOMOLOGY MODELING WORKFLOW FOR DRUG TARGET

No results yet

PRODUCTION AND TUNING ON HPC INFRASTRUCTURE

POLIMI, CINECA

Summary

In the WP8 “Production and tuning on HPC infrastructure”, the POLIMI team is responsible for the continuous tuning and code adaptation of the EXSCALATE platform, and contributes to its porting on the new Marconi100 partition at CINECA.
The first month of the E4C project coincided with the initial production period of the new CINECA machine. POLIMI ported a first version of the docking library to exploit the computing capability of the heterogeneous computing node composed of 2xIBM-Power9 Sockets and 4xNVidiaV100.
During the experimental campaign, we reached a throughput of more than 250K optimal ligand poses per second on a single node of the Marconi100 machine. Porting and tuning on multiple nodes is currently ongoing.

A few more details

To promote agile and portable software development needed to guarantee continuous releases of updated functionalities and improved elaborated throughput of the EXSCALATE software platform, we encapsulated all the geometrical docking, issued in a stand-alone library called LiGen GeoDock, with a simple, stable, and well-defined non-virtual interface. Figure 1 shows an overview of the LiGen GeoDock library, which aims at docking a ligand in a target pocket using geometrical information only. To better encapsulate Geodock in the EXSCALATE platform, we designed a single public interface hiding implementation details. In this way, we can improve GeoDock without hindering the development of the other components. The E4C project targets a heterogeneous node composed of GPUs and CPUs. The previous C++ code version was unable to harness the full computational power of the new CINECA Marconi100 node based on CPUS only. To overcome this limitation, we implemented the docking algorithm using the OpenACC pragma-based language and the PGI OpenACC compiler to generate device-specific code while maximizing the code portability. However, given the transition to the Marconi100 supercomputing cluster at CINECA, we then implemented the docking algorithm using the CUDA C/C++ language to obtain the maximum performance from of the NVIDIA Tesla V100 GPUs.

We ran an initial experimental campaign to assess the ligand library performance with a wide range in the number of atoms and rotatable bonds. The code peak performance using only the 2xIBM-Power9 and 128 software threads reaches a throughput of 30K optimal ligand poses per second, while the exploitation of the full node including the 4xNvidiaV100 GPUs reaches a throughput of more than 250K optimal ligand poses per second.

These performance results obtained so far on the new CINECA Marconi100 partition are very promising, and they are a good starting point for further tuning of the LiGen GeoDock library. As next steps, we envision pushing further on a fine-grain refinement and adopting dynamic autotuning approaches, combined with the entire machine-level scaling.

Overview of the LiGen geometrical docking library deployed on a Marconi M100 computation node at CINECA supercomputing center
Figure 1: Overview of the LiGen geometrical docking library deployed on a Marconi M100 computation node at CINECA supercomputing center.

REGULATORY CONTRACTS

REGULATORY CONTRACTS

No results yet

DISSEMINATION & EXPLOITATION OF THE FOREGROUND

CHELONIA

Summary

Setup and Maintenance of Website & Social Media (D10.1).

Chelonia leads the coordination of dissemination and communication activities. The project webpage www.exscalate4cov.eu was produced within the first month of the project and is now regularly updated and maintained. Communication activities ensure the diffusion of the project outcome outside the Consortium so that the methodologies developed and the results obtained can be widely disseminated in the scientific community.
Communication objectives are achieved by using multiple different strategies including the use of social networks such as Linkedin, Facebook and Twitter to spread the project knowledge to a larger audience.
Dedicated exscalate4cov pages have been created and all communication are mainly tagged #exscalate4cov. Given the wide social impact of this research, actions towards the general press are given through press releases, newsletters and through local press.

DISSEMINATION & EXPLOITATION OF THE FOREGROUND

CHELONIA

Summary

Initial Dissemination Activities and Plans (D10.2).

This task defines the strategy for exploitation of the project results. The activity, led by CHELONIA, in tight collaboration with DOMPE, analyzes the strategies and implements the actions to maximize the utilization of the project results, beyond the project partners and time-frame. A dissemination plan was issued to all partners 3 months earlier than the expected deadline in order to facilitate dissemination and to align partners in communicating projects results to the target audience. A first webinar was organized on April 28th, 2020, thanks to the support of CECAM (Centre Européen de Calcul Atomique et Moléculaire – www.cecam.org ), which is an organization devoted to the promotion of fundamental research on advanced computational methods and to their application to important problems in frontier areas of science and technology. To access the recorded webinar please visit YOUTUBE

Address

Exscalate4cov
c/o Dompé Farmaceutici
Via Pietro Castellino, 111
80131 Napoli, Italy