home > library > publications > scaling biomedical topic maps to billions of associations: ...

close subject identifiers for Scaling Biomedical Topic Maps to Billions of Associations: How to ...
  • http://www.topicmapslab.de/publications/scaling_biomedical_topic_maps_to_billions_of_associations

Scaling Biomedical Topic Maps to Billions of Associations: How to Cope With Terabytes of Data?

Poster, was published by Benedikt Wachinger and Volker Stümpflen at 2010-09-30

This poster deals with issues of large-scaled systems and the usage of Topic Maps.

In order to understand biological systems generally and multifactorial diseases specifically, it is necessary to be able to create large-scale systems biological models as quickly as possible from the huge amounts of knowledge stored in multiple relational databases and published research articles. To achieve this we had to solve two problems: First, how do we solve the data integration problem if we want to store all that knowledge in one easily accessible place to be combined efficiently? And secondly, how do we efficiently store and manage the ever increasing amount of data, currently in the range of hundreds of terabytes? The first problem can be tackled with Topic Maps™, where a simple conversion schema from a relational database has to be developed. The data increase, however, entails that the underlying storage solutions have to scale accordingly. Since traditional approaches like relational databases do not scale well or only with a huge amount of administrational work, newer technologies able to distribute the data to clusters with arbitrary numbers of nodes and to limit I/O bottlenecks had to be found. Such technologies exist in cloud-like cluster architectures where storage and computation is done on the same machine. To use this, Google initially developed a column-oriented database concept called BigTable, which is essentially a very large key-value store. Hadoop HBase is an open source implementation of this concept. We have now invented a method for the efficient storage and retrieval of TMs in HBase. We have developed a column-oriented schema, able to reflect TMs efficiently in such key-value stores. At our institute we use this schema to integrate multiple biological databases from different resources in one central repository. Additionally, we have a semantic text mining system able to extract biologically relevant relations from the mass of available biomedical texts. Currently, our largest TM consists of over 4 billion associations.

Presented at

TMRA 2010

Conference in Leipzig from {{start}} to {{end}}


With Linked Topic Maps the motto of the TMRA 2009 conference was about spinning a global web of interchangeable and linkable topic maps. Linked …

Visit homepage of TMRA 2010


Follow us on Twitter


In Musica migrans we mapped the life courses of musicians in the 19th century. Topic Maps provides us the flexibility we need to model the diversity in the lives of the artists.

Lutz Maicher
Musica Migrans
Topic Maps Lab auf der Cebit 2011

Graduate from the Topic Maps Lab