January 15, 2007
NSF project award number: 0644424
NSF directorate: CISE
NSF Division: IIS
PI: Natasa Przulj
Project start date: January 15, 2007
Expected project end date: December 31, 2011
The award abstract:
The project proposes improvements in tools for analyzing and modeling of Protein-Protein Interaction (PPI) networks. A PPI network is a mathematical representation of the physical interactions amongst proteins in a cell: it is a graph in which nodes represent proteins and edges between the nodes represent possible physical interactions between the corresponding proteins. Currently available PPI networks are partial and mostly resulting from various high throughput proteomics experiments that attempt to discover as many PPIs as possible in a number of organisms. The resulting PPI data sets are large including tens of thousands of interactions amongst thousands of proteins even for a unicellular eukaryotic organism such as baker’s yeast.
The project introduces new, more sensitive measures of network local structure that are based on enumeration of “graphlets”, small induced subgraphs of large graphs, as well as on distinguishing during the enumeration between the nodes in a graphlet that cannot be mapped to one another by an automorphism (an isomorphism of a graph with itself). In this way, the degree distribution is generalized into the spectrum of “graphlet degree distributions”. These new measures produce graphlet-based network “feature vectors” that are further used to heuristically compare synthetic (model based) networks to experimentally determined PPI networks. The project’s preliminary results indicate that geometric random graphs provide a superior fit to the currently available PPI networks than do other network models including scale-free networks. The project proposes to build upon these preliminary results towards an efficient PPI network alignment heuristic that would incorporate biological information of the proteins in the networks being aligned, such as protein functional, structural, and amino-acid sequence similarity. Furthermore, the project proposes to use geometric random graph models of PPI networks as the basis for algorithmic strategies that would optimize the cost of experimentally exploring the entire space of protein-protein interactions in the cell. The use of geometric random graph models and associated graphlet-based network feature vectors can be extended to other network modeling applications, such as epithelial planar cell polarization, chemical informatics, social network analyses, etc.
It is expected that the project will lead to better algorithms for various network comparison problems on PPI and other networks. Exploitation of a network model may significantly reduce the cost of characterizing the interactomes of various organisms.
The PI is involved in training high school, undergraduate and graduate students, including two female students. This work is being included in various UC Irvine courses. The software will be provided as a free open-source toolkit for other researchers. Additional information will be available at http://www.ics.uci.edu/~natasha/.
The project was funded under the NSF Faculty Early Career Development (CAREER) Program.