Publications and Presentations
- Publications
-
2008
-
Andrew Newman, Yuan-Fang Li, and Jane Hunter, "Scalable Semantics – the Silver Lining of Cloud Computing", In 4th IEEE International Conference on e-Science (e-Science 2008), Indianapolis, Indiana, USA, December 7-12, 2008.
(abstract)
Semantic inferencing and querying across large-scale
RDF triple stores is notoriously slow. Our objective
is to expedite this process by employing Google’s
MapReduce framework to implement scale-out distributed
querying and reasoning. This approach requires
RDF graphs to be decomposed into smaller units that
are distributed across computational nodes. RDF Molecules
appear to offer an ideal approach – providing
an intermediate level of granularity between RDF
graphs and triples. However, the original RDF molecule
definition has inherent limitations that will adversely
affect performance. In this paper, we propose a
number of extensions to RDF molecules (hierarchy and
ordering) to overcome these limitations. We then
present some implementation details for our MapReduce-
based RDF molecule store. Finally we evaluate
the benefits of our approach in the context of the BioMANTA project –
an application that requires integration
and querying across large-scale protein-protein
interaction datasets.
- Andrew Newman, Yuan-Fang Li, and Jane Hunter, "A Scale-Out
RDF Molecule Store for Improved Co-Identification, Querying and Inferencing",
In the 4th International Workshop on
Scalable Semantic Web knowledge Base Systems (SSWS2008) at the 7th International
Semantic Web Conference (ISWC2008), Karlsruhe, Germany, October 26-30, 2008.
(abstract)
Semantic inferencing and querying across large scale RDF triple stores is notoriously slow. Our objective is to expedite this process by employing Google’s MapReduce framework to implement scale-out distributed querying and reasoning. This approach requires RDF graphs to be decomposed into smaller units that are distributed across computational nodes. RDF Molecules appear to offer an ideal approach – providing an intermediate level of granularity between RDF graphs and triples. However, the original RDF molecule definition has inherent limitations that will adversely affect performance. In this paper, we propose a number of extensions to RDF molecules (hierarchy and ordering) to overcome these limitations. We then present implementation details for our MapReduce-based RDF molecule store describing: (a) graph decomposition into molecules; (b) SPARQL querying across molecules; and (c) molecule merging to retrieve the search results. Finally we evaluate the benefits of our approach in the context of the BioMANTA project – an application that requires integration and querying across large-scale protein-protein interaction datasets. The results of performance evaluations based on this case study are presented and discussed.
- Andrew Newman, Jane Hunter, Yuan-Fang Li, Chris Bouton, and Melissa Davis,
"A Scale-Out RDF Molecule Store for Distributed
Processing of Biomedical Data", Semantic Web for health Care and Life Sciences
Workshop (HCLS'08) at 17th
International World Wide Web Conference (WWW2008),
Beijing, China, April 22, 2008.
(abstract)
The computational analysis of protein-protein interaction and
biomolecular pathway data paves the way to efficient in silico
drug discovery and therapeutic target identification. However,
relevant data sources are currently distributed across a wide range
of disparate, large-scale, publicly-available databases and repositories
and are described using a wide range of taxonomies and
ontologies. Sophisticated integration, manipulation, processing
and analysis of these datasets are required in order to reveal previously
undiscovered interactions and pathways that will lead to
the discovery of new drugs. The BioMANTA project focuses on
utilizing Semantic Web technologies together with a scale-out
architecture to tackle the above challenges and to provide efficient
analysis, querying, and reasoning about protein-protein interaction
data. This paper describes the initial results of the BioMANTA
project. The fully-developed system will allow knowledge representation
and processing that are not currently available in typical
scale-out or Semantic Web databases. We present the design of
the architecture, basic ontology and some implementation details
that aim to provide efficient, scalable RDF storage and inferencing.
The results of initial performance evaluation are also provided.
- Andrew Newman, Jane Hunter, Yuan-Fang Li, Chris Bouton, and Melissa Davis,
"BioMANTA Ontology: The Integration of
Protein-Protein Interaction Data" , Interdisciplinary Ontology
Conference (InterOntology08 Tokyo), Tokyo, Feb 26-27, 2008.
(abstract)
Protein-protein interaction (PPI) and biomolecular pathway data hold
tremendous potential for drug discovery and development. However, relevant data
sources are currently distributed across a wide range of disparate, large-scale, publicly-
available databases, web sites and repositories and are described using a
wide range of taxonomies and ontologies. Sophisticated integration, manipulation,
processing and analysis of these data sets are required in order to reveal previously
undiscovered interactions and pathways that will lead to the discovery of new
drugs. The Semantic Web has been investigated as a solution to this problem by a
number of projects that use RDF and OWL to integrate, represent and analyze
protein interaction data. However, existing applications have suffered from certain
limitations that hinder their usefulness. In this paper, we describe work being undertaken
within the BioMANTA project that aims to identify and overcome the
limitations associated with the application of Semantic Web technologies to protein
interaction network analysis. In particular we describe the BioMANTA OWL
ontology that has been designed to enable multiple data sources to be integrated
within a single RDF triple store through a common PPI model. The primary aim
of the BioMANTA ontology is to provide a practical means of facilitating the integration
of semantically disparate data sets – it does not aim to provide a precise
biological, chemical or physical model of how proteins interact. We also describe
how this ontology was developed through the refinement, harmonization and extension
of existing ontologies. Finally, we describe the mapping, integration and
querying of a range of protein-protein interaction data sets, based on the ontology.
- M.Davis, A.Newman, I.Khan, J.Hunter, M.A. Ragan,
"Integrating
Hierarchical Controlled Vocabularies with OWL Ontology: A Case Study from the
Domain of Molecular Interactions", accepted to 6th Asia Pacific Bioinformatics
Conference (APBC08), Kyoto, Jan 14-17, 2008.
- Presentations and Posters
- 2008
- Andrew Newman, "A Scale-Out RDF Molecule Store
for
Distributed Processing of Biomedical Data", Semantic Web for Health Care and
Life
Sciences Workshop, The 17th International Semantic Web Conference, April 22, 2008.
- Andrew Newman, "Integrating Protein-Protein
Interaction Data", Interdisciplinary Ontology Conference (InterOntology08
Tokyo),
Tokyo, Feb 26-27, 2008.
- Melissa Davis, "Integrating Hierarchical Controlled
Vocabularies with OWL Ontology", 6th Asia Pacific Bioinformatics Conference
(APBC08), Kyoto, Jan 14-17, 2008.
-
2007
- Melissa Davis and Andrew Newman, "The
Architecture and Design of the BioMANTA Ontology and Software", ARC Centre of
Excellence in Bioinformatics (ACB) All Hands Meeting, Nov 2007
- Muhammad Shoaib Sehgal, "
BioMANTA: Modeling and Network Analysis of Biological Networks II -
Knowledge Discovery and Network Analysis", ARC Centre of
Excellence in Bioinformatics (ACB) All Hands Meeting, Nov 2007
- Muhammad Shoaib Sehgal, Melissa J. Davis, Kevin Burrage, Victor Farutin,
Jane Hunter, Fred Jerva, Imran Khan, Yuan-Fang Li, Andrew Newman, Michael
Schaffer and Christopher Bouton, Mark A. Ragan, "
BioMANTA" (Poster), ARC Centre of
Excellence in Bioinformatics (ACB) All Hands Meeting, Nov 2007
- Mike Schaffer and Victor Farutin,
"COBALT, Connection of
Biological Lists", Nov 2007
- Muhammad Shoaib Sehgal, Melissa J. Davis, Kevin Burrage, Victor Farutin,
Jane Hunter, Fred Jerva, Imran Khan, Yuan-Fang Li, Andrew Newman, Michael
Schaffer and Christopher Bouton, Mark A. Ragan
"Representing the Interactome:
Transforming data from the Semantic Web for network meta-analysis"
(Poster), Bioinformatics Australia 2007 Conference, Oct 23-24, 2007.
- Andrew Newman, "Overview of BioMANTA
software, RDF triple storage and RDF molecules", ITEE eResearch Meeting,
Oct 2007