From genes to pathogenesis
In 1986, the first gene – the dystrophin gene – implicated in Duchenne dystrophy, the most common form of muscular dystrophy, was identified. Since that time, thanks to the power of genetic analysis, more than 200 genes that cause neuromuscular disease are now known. However, the function and the interactions of the corresponding proteins, which are encoded by these genes, are still generally unknown. While enhancing our knowledge, the number of genes identified has also revealed the overall complexity of the pathogenesis of these diseases.
All cells of an organism contain the same genetic information in the form of DNA, which makes up the genome. The expression of genes (more than 20,000 have been identified in humans) contained in the genome leads to the production of proteins (more than 45,000 have been described, with each gene coding for 1.6 proteins on average), which ensure the functioning of cells, with specificities depending on the organs, such as muscles. Some of these proteins are enzymes; some are signaling molecules; others are receptors that bind to other molecules named ligands with high affinity and specificity (like the functioning of a key and a lock); and others are structural proteins. The conformation of the protein in space (3D-structure) determines their interactions and its function.
The functioning of muscles depends on numerous proteins. These proteins are localized and act at different levels of the muscle cell itself, the cell body, or the motor nerve axon – which excites the muscle, or at the junction between the nerve and the muscle.
Most neuromuscular diseases are due to genetic modifications (mutations in certain genes leading to mutations in the corresponding protein) that lead to non-functional or partly-functional proteins or even no protein at all. The neuromuscular disease will be different according to the protein and the site of the protein affected by the mutation.
In order for researchers to understand how neuromuscular diseases prevent proteins from performing their necessary functions to maintain muscle and nerve health, a better and more detailed understanding of function and interactions of proteins involved in the pathogenesis of each muscular dystrophy is required.
Knowing the function and interactions of proteins is also critical for helping researchers to design therapeutic strategies. Without this vital knowledge, scientists will not be able to develop innovative therapies that will lead to treatments for the vast majority of neuromuscular diseases.
World Community Grid and the Decrypthon Molecular Docking Project
Help Cure Muscular Dystrophy will apply the power of World Community Grid to determine the protein-protein interactions for more than 2,200 non-redundant protein structuress known and stored in the Protein Data Bank (www.rcsb.org/pdb). The analyzed proteins will include those found to be mutated in neuromuscular disorders.
This will result in a new database of information on functionally interacting proteins. Further extensions will include studies of protein binding sites involved in interactions with DNA ligands (such as drugs). This will be of significant medical interest since, while it is now feasible to design a small molecule to inhibit or enhance the binding of a given molecule to a given partner, it is much more difficult to understand how that same small molecule could directly or indirectly influence other existing interactions.
The approach proposed in this project combines evolutionary information (how evolution modified proteins to enhance their function) and molecular modeling (computational determination of the relative position of two interacting protein partners) to identify potential interactions.
Molecular modeling refers to theoretical methods and computational techniques to model or mimic the behavior of molecules. These methods and techniques are used to investigate the structure of biological systems such as protein folding or molecular recognition of protein-ligand binding, ranging from small chemical systems to large biological molecules and assemblies of material (protein complexes).
Protein-ligand docking is a molecular modeling technique to predict the position and orientation (the 3D-structure) of a protein in relation to a ligand (another protein, DNA, drug, etc.). Docking methods are based on purely physical principles; even proteins of unknown function (or which have been studied relatively little) may be docked. The only prerequisite is that their 3D-structure has been either determined experimentally, or can be estimated by some theoretical technique.
The docking approach generally starts with a database of known molecules and attempts to find pairs of molecules which have an affinity to bind to one another. The affinity is estimated using a so-called scoring function. In the end, a list of the best-binding molecules for a targeted protein is returned. The quality of fit has a geometric and a chemical component. The geometric component measures how well the surface shapes (the 3D-structures) complement each other ? like a hand in glove. The chemical component measures the quality of the atomic interactions between the partner molecules (i.e. are the interactions strong or weak?).
For complex structures like proteins (the smallest are composed of hundreds of atoms), it takes considerable computer time to determine the fit of correct protein-protein interactions. Without World Community Grid, the computations required to conduct the docking would be prohibitively time consuming. For the first 168 selected proteins docked in Phase 1, the CPU time on World Community Grid was of about 8,000 years. With the 2,246 proteins of Phase 2, the estimated time is 11.46x8000 = 91,680 years on World Community Grid.
A solution to this computational barrier is to use evolutionary information to predict potential binding sites and realize localized docking on surfaces which are most likely to interact. This preliminary analysis based on protein evolution highly reduces computational time by a factor of 100 and therefore allow us to extend the analysis at large scale with the crucial help of World Community Grid. Without World Community Grid, the computations required to conduct the (localized) docking at large scale would be prohibitively time consuming.
Volunteers donating their computer time to World Community Grid will be searching for the best protein-protein partners. Their participation will significantly augment other work being conducted by AFM, CNRS, INSERM and other scientific players to contribute to the development of scientific tools to increase knowledge which is vital for the development of new therapies for rare diseases.