<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>RSS feed</title><link>/soeren/category/research/</link><description>RSS feed for Research</description><language>en</language><lastBuildDate>Tue, 24 Nov 2009 18:13:14 -0000</lastBuildDate><item><title>Large Scale Learning</title><link>http://sonnenburgs.de/soeren/item/large-scale-learning/</link><description>&lt;img class="ls" src="/soeren/media/images/ls.png" alt="Large Scale" /&gt; 
Much of my current research focuses on kernel methods such as Support Vector Machines (SVMs) for sequence analysis problems appearing in bioinformatics.  For example, I co-organized the &lt;a href="http://largescale.first.fraunhofer.de"&gt;PASCAL Large Scale Learning Challenge&lt;/a&gt;. Currently, we are preparing JMLR special topic on Large Scale Learning (soon to be published).
In addition, I am working on the design of new &lt;em&gt;efficient&lt;/em&gt; string kernels as well
as faster algorithms to train and evaluate SVMs on sequences. Before I began to work on this topic, it had been almost unthinkable to train SVMs using sophisticated string kernels on more than a few hundred thousand examples. Using the newly developed methods we can now solve learning tasks involving up to 50 million training examples. Requiring reasonable amounts of computing time, we can now apply the resulting classifier to the whole human genome with as much as 6 billion examples.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Tue, 24 Nov 2009 08:37:43 -0000</pubDate><guid>http://sonnenburgs.de/soeren/item/large-scale-learning/</guid></item><item><title>Genomic Sequence Analysis</title><link>http://sonnenburgs.de/soeren/item/genomic-sequence-analysis/</link><description>I have been working to employ these methods to splice site recognition
				in several organisms &lt;a href="http://www.fml.tuebingen.mpg.de/raetsch/projects/splice"&gt;(link)&lt;/a&gt;. Together with my collaborators, I was able to show
				that our methods drastically outperform all other methods, which is pivotal for the high accuracy of a novel
				splice form prediction tool, &lt;a href="http://www.msplicer.org"&gt;mSplicer&lt;/a&gt;, and the
				success of a related gene finding system, &lt;a href="http://www.mgene.org"&gt;mGene&lt;/a&gt;, in the
				&lt;img class="splice" src="/soeren/media/images/splice.png" alt="Splicing" /&gt; 
				&lt;a href="http://www.wormbase.org/wiki/index.php/Gene_Prediction"&gt;nGASP competition&lt;/a&gt;. Additionally, we have developed a promoter detection system
				&lt;a href="http://www.fml.tuebingen.mpg.de/raetsch/projects/arts/"&gt;"ARTS"&lt;/a&gt; , that detects
				transcription start sites on the whole human genome. Our approach works
				with much higher accuracy than previous state of the art methods and by
				using the developed large scale learning techniques, the SVMs could be
				trained in only a few hours and applied genome wide.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Tue, 24 Nov 2009 18:12:35 -0000</pubDate><guid>http://sonnenburgs.de/soeren/item/genomic-sequence-analysis/</guid></item><item><title>Interpretability</title><link>http://sonnenburgs.de/soeren/item/interpretability/</link><description>SVMs find a discrimination in a high dimensional kernel feature space
				and as such often have to be treated as a black box.  This implies that
				&lt;img class="poim" src="/soeren/media/images/poim.png" alt="POIM" /&gt;analyses or visualization of the learning result is inherently
				difficult. It poses a problem for applications in bioinformatics as it
				is often very important to understand which features are used for
				learning and why the accuracy is high. I have developed a novel approach
				based on &lt;a href="http://www.fml.tuebingen.mpg.de/raetsch/projects/lsmkl"&gt;Multiple
					Kernel learning&lt;/a&gt; that
				can be used for discovering discriminative features of the
				underlying biological problem. An extended approach --- the so
				called Positional Oligomer Importance Matrices (&lt;a href="http://www.fml.tuebingen.mpg.de/raetsch/projects/POIM"&gt;POIMs&lt;/a&gt;) ---
				allows us to pin-point motifs, is very efficient and can be
				directly applied to the learned SVM classifier.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Soeren Sonnenburg</dc:creator><pubDate>Tue, 24 Nov 2009 18:13:14 -0000</pubDate><guid>http://sonnenburgs.de/soeren/item/interpretability/</guid></item></channel></rss>