|
ACS Home Related Links:
Advanced Campus Services |
Bio-Informatics Information Integration Georgia State University : Vijay Vaishnavi Art Vandenberg Susmita Datta Roop Gaurav Singh AbstractAs more advanced DNA sequencing technologies are developed, genome research projects generate enormous amount of data, expanding exponentially and doubling every 12-18 months. Increasingly biomedical research must contend with large datasets, multi-terabyte, even petabyte (1000 TB) datasets, that are presenting challenges to researchers. Clearly such datasets can be major assets in the biomedical community depending on the efficiency of managing not only transmission, sharing, and access among collaborating researchers but also understanding the metadata semantics that can allow interoperability. Currently each scientist accesses and uses only a small portion of this data, mainly because it is physically impossible to see all relevant information due to the heterogeneity present in the metadata and the practical difficulties of resolving these metadata disparities across geographically distributed data sources. The current state of data management issues and requirements for Bio-informatics are well documented by [Jagadish and Olken, 2003]. We have been working on a novel approach to mitigate the "essential" heterogeneity problem, with the goal of providing a uniform seamless interface that facilitates a scientist's access to the growing number of heterogeneous, independently compiled bioinformatics data sources, utilizing available tools for addressing accidental heterogeneity. This goal corresponds to the "idealized" system envisioned by Jagadish and Olken "that actively identifies data sources of interest, automatically overcomes syntactic and semantic heterogeneities wherever it discovers them, and provides transparent declarative, optimized query access over all sources." We have done preliminary work in the domain of directory metadata: developing an architecture model, demonstrating feasibility of our approach by experimental validation of clustering algorithms, and implementing a prototype (Semantic Facilitator TM SM ). We have also articulated the problems of integration of semantically heterogeneous entities, research approaches devoted to the problem, and challenges to web-enabled virtual communities. This poster paper presents a summary of this work and our intended research on information integration of Bio-informatics sources.
|
Last Updated: March 2, 2006