BioMANTA

The Modelling and Analysis of Biological Network Activity (BioMANTA) Project encompasses development of novel biological network analysis methods and infrastructure for querying biological data in a semantically-enabled format, and aims to create a semantic interactome model. Research within the BioMANTA project will focus on computational modelling and analysis, primarily using Semantic Web technologies and Machine Learning methods, of large-scale protein-protein interaction and compound activity networks across a wide variety of species. A range of information such as kinetic activity, tissue expression, and subcellular localization and disease state attributes will be included in the resulting data model.

This project is two-year scientific research collaboration between

The Computational Sciences Center of Emphasis, Pfizer Global Research and Development, Pfizer Inc., Cambridge, Massachusetts, USA,
The Institute for Molecular Bioscience (IMB), and
The School of Information Technology and Electronic Engineering (ITEE), The University of Queensland, Australia

Protein interactions are a fundamental component of biological processes. Many proteins are functional only in multimeric complexes, or require interaction partners to achieve their correct localisation or function. For this reason, the study of protein-protein interaction (PPI) networks has become an area of growing interest in computational biology.

Through the use of Semantic Web technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL), interaction data is modelled to create a knowledge representation in which meaning is vested in the ontology rather than instances of data. Stochastic and computational intelligence methods are applied to this data to infer high coverage networks. Semantic inferencing is used to infer previously unknown and meaningful pathways.

Major project components

The BioMANTA Ontology
Data conversion & semantic protein integration
A RDF triple store based on RDF Molecules and the MapReduce architecture

Hadoop

SPARQL

A quantitative framework to integrate networks extracted from independent data sources (gene expression, subcellular localisation, and ortholog mapping)