BRIDGES is developing and exploring database integration over six geographically distributed research sites within the framework of the large Wellcome Trust biomedical research project Cardiovascular Functional Genomics. Three classes of integration are being developed to support a sophisticated bioinformatics infrastructure supporting: data sources (both public and project generated), bioinformatics analysis and visualisation tools, and research activities combining shared and private data. The inclusion of patient records and animal experiment data means that privacy and access control are particular concerns. Both OGSA-DAI and IBM Information Integrator technology are being employed and a report will identify how each performed in this context.
Project Goals
The project will deliver the following results:
Project Partners
Current Status/Results
BRIDGES Web Portal
Delivery of the above services to the end users needs to be robust but simple since end users may be inexperienced in grid technology and IT generally. BRIDGES therefore uses a web portal as its user interface. Various portal technologies were tested and the final choice was IBM Websphere, largely on grounds of versatility and robustness. The portal allows users to configure their individual workspaces and their settings are stored between sessions. Delivery of the applications is either through purpose-written portlets or -- where required -- by means of Java Webstart technology, which allows easy, centralised delivery of legacy java applications.

Data Integration - Public Domain Data
One surprising outcome of the project so far has been the lack of programmatic access to live biological databases. This has introduced an additional requirement for a data warehouse (implemented as a DB2 database) which is populated with data derived from flat file data dumps of the public domain databases. Data federation operates across the warehouse and the few available databases with programmatic access. To allow the integration of data from heterogeneous public data sources, BRIDGES is using two different technologies (for the purpose of their evaluation/comparison). IBM's Information Integrator is a commercial package that allows the binding of data sources through standardised wrappers specific to the type of data source. The OGSA-DAI package achieves the same end by using grid services as a layer between client applications and the data source. The current version of the public data federation system uses IBM's Information Integrator, while an alternative version using OGSA-DAI is currently under construction. Users can browse the data via the GeneVista visualisation tool, which is available via the BRIDGES portal.

Data Sharing - CFG Project Data
Two separate candidate resources for the sharing of raw microarray data, MIAMEExpress and MaxLoad2, have been set up on the BRIDGES portal and are awaiting user evaluation for comparative purposes.
Computational Resources -- GridBLAST
Like many other biomedical research projects, the CFG project has a need for large scale biological sequence comparisons. These are mostly carried out using the BLAST tool, a widely used search algorithm that is used to compute alignments of nucleic acid or protein sequences with the goal of finding the n closest matches in a target data set. This is computationally costly and therefore a task that can benefit from being adapted to run on a compute grid.
We have developed a GT3 based grid service which provides a parallelised BLAST service for the users. Multiple query sequences are partitioned into sub-jobs on the basis of the number of idle compute nodes available and then processed on these in batches. To achieve this, we have written our own java based scheduler which distributes sub-jobs across an array of resources. The current resources available include a Condor pool at the National e-Science Centre Hub in Glasgow (which is available to
all users), the ScotGRID compute
cluster, and the recently added compute clusters of the National Grid
Service. Our scheduler farms BLAST jobs out to these resources and then combines the results of subjobs.
To enable us to control which users access which resources (which may be necessary depending on the local policies at individual resources) we have implemented a role based access control system which uses the PERMIS grid authorisation software. For each job submitted, our service queries a PERMIS authorisation service for the roles a given user has, and allocates resources according to these.
Job submission to the National Grid Service is through GSI-enabled Globus 2 jobs, with Java Cog kit client side code, and uses a host proxy for authentication. This eliminates the need for our users to acquire and manage digital certificates and instead user authentication happens at the BRIDGES portal by means of standard username and password pairs. The PERMIS software gives us the option of then controlling what users are allowed to do once authenticated.
The client side code that we use for job submission to the NGS is publicly available on our code page.

Visualisation Clients
Currently BRIDGES offers two visualisation clients which allow the viewing and analysis of data from both public and project sources. SyntenyVista is a tool for the graphical display of genes on chromosomes and their homologues in other species. It also allows the display of Quantitative Trait Loci (QTLs), chromosome regions containing genes which influence the expression of quantitatively varying traits such as hypertension.
GeneVista is a tool for the integrated textual display of data from the federated data sources (see above). GeneVista provides a convenient way of viewing all the available data relating to a single gene. For details of these tools please refer to the papers listed in the publications section below.
![]() |
GeneVista Visualisation tool running in BRIDGES portal |
![]() |
SyntenyVista Visualisation tool |
Publications, Presentations and Demos
Publications:
Presentations: (available on request)
Demos:
Forthcoming events: