Neuroscience Gateway homepage

Article navigation

Neurotechniques

What your friends say about you

Neuroscience Gateway (April 2007) | doi:10.1038/aba1732

Researchers identify disease-associated genes by searching a protein interactome for related proteins that are associated with similar disease phenotypes.

You can learn a lot about someone by the company he keeps. This is the basic logic behind protein interaction networks, which indicate protein function by association. Disruption of several different proteins involved in the same cellular function can result in similar phenotypes. Based on this assumption, Lage et al. combined clinical phenotype data with a protein interaction network and used the resulting 'phenome–interactome' to identify genes important in disease in a recent article in Nature Biotechnology.

The Online Mendelian Inheritance in Man (OMIM) database contains genetic linkage intervals associated with human disorders and descriptions of disease phenotype. For each disease, the authors generated a vector that represented the medical terms used to describe the disease phenotype. To compare two disease phenotypes, they calculated the cosine of the angle between the corresponding vector pair. Because the cosine of a zero–degree angle is one, the phenotype similarity score for similar diseases approaches one. The authors found phenotype similarity scores of nearly one for a set of similar disease pairs provided by OMIM curators.

The authors generated their protein interaction network by pooling existing protein interaction data. The resulting interactome contained roughly 343,000 protein interactions.

A lengthy process connected phenome and interactome data. For the disease of interest, the authors first identified the genes in the disease–associated linkage interval. With the proteins encoded by the candidate genes, they then did 'virtual pull down' experiments, querying the interactome for proteins with which the candidate proteins interact. They generated phenotype similarity scores for diseases associated with each interacting protein and the disease of interest and used a Bayesian model to incorporate these scores into a probability score predicting the likelihood that each candidate gene associates with the disease of interest.

The authors tested their method with 1,404 proteins known to associate with disease. At a threshold candidate probability score of 0.1, 45% of the proteins they identified were relevant to the disease of interest. They identified known disease–causing proteins with candidate scores of 0.9 or greater 65% of the time.

Researchers have yet to identified specific genes for 870 disease-associated linkage intervals in OMIM. For 91 of these intervals, the authors' technique identified candidate genes with scores greater than 0.2. For example, the authors identified two candidate genes involved in amyotrophic lateral sclerosis (ALS) with frontotemporal dementia, which maps to chromosomal region 9q21–q22: bicaudal D homolog 2 (BICD2) and cytoplasmic isoleucyl–tRNA synthetase (IARS). BICD2 interacts with dynactin, which is associated with a type of ALS that does not cause dementia, and IARS interacts with proteins associated with ALS and proteins associated with dementia.

These data and the interactome containing 506 protein complexes associated with disease are freely available. The authors' method is suited to identify not only disease-causing genes to be targeted by therapies, but perhaps also new disease-associated markers important in diagnosis.

Debra Speert

  1. Lage, K. et al. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nature Biotechnology 25, 309–315 (2007). | Article |