193-5 Microsoft Open Source Tools and Libraries for Genomics.



Tuesday, October 18, 2011: 3:35 PM
Henry Gonzalez Convention Center, Room 209, Concourse Level

Robert I. Davidson, Simon Mercer and Michael Zyskowski, Microsoft Research, Redmond, WA
Microsoft Open Source Tools and Libraries for Genomics

Robert I. Davidson, Simon Mercer, and Michael Zyskowski
Microsoft Research Connections, One Microsoft Way, Redmond, WA 98052, USA
(bobd@microsoft.com)

Main project webpage: http://research.microsoft.com/bio
Source code available at: http://mbf.codeplex.com

Open Source license: Apache 2.0

Many people are surprised to find that Microsoft Research (MSR) has been collaborating with Life Science researchers for many years.  Yet software with the ability to effectively model and thus develop new insights into the relationships and workings of biological system is computationally challenging, consumes large amounts of computer resources, and can be very difficult to write quickly, correctly, and with high performance.  These types of software problems are very interesting. 

Late in 2008, leveraging concepts from the earlier collaborations in biology, a small group within MSR began development of a library of basic bioinformatics functionality to be made available to outside developers and researchers under an Open Source license.  Written in C# and built on the .NET framework, the first version of the Microsoft Biology Foundation (MBF) was launched in July 2010 and released under the OSI-approved MS-PL license.  MBF 1.0 supports access to common file formats and provides objects and methods to operate on and manipulate genomic sequences (DNA, RNA, and protein) as well as other data types.  Version 1.0 has gained interest and established a small community of users which are playing a role in development through the contribution of source code, bug reports, and patches.

A second version of the library is progressing and will result with the launch of version 2.0 in the summer of 2011.  Performance and capacity enhancements were an early focus for V2 and the next version will provide a range of new features and applications demonstrating access to advanced math functions and comparative DNA sequence assembly.  There is also a range of command line tools and a tool showing how the DeepZoom visualization technology can be used to browse a genome quickly and intuitively.

Ongoing efforts to promote the growth of an open source community around MBF include the establishment of community forums at http://mbf.codeplex.com and the creation of a Technical Advisory Board consisting of academic and commercial groups using MBF in their work.  To clarify community ownership, we have moved to the more well known Apache 2.0 license and we plan to transfer the project from Microsoft to the OuterCurve Foundation (http://www.outercurve.org/), making Microsoft one contributor among many to a common open source project.

This presentation will report on some of the tools produced using the Microsoft Biology Foundation, the growth of the Microsoft Biology Foundation community, and plans for the next phase of development.

See more from this Division: ASA Section: Biometry and Statistical Computing
See more from this Session: Symposium--Bioinformatics for Crop Improvement: Assay Design and Applications