Supplementary Material for the Recomb 2005 and the BMC Bioinformatics submission

Learning Interpretable SVMs for Biological Sequence Classification

This page contains additional material to the above mentioned paper. We tried to document exactly
  1. which data sets where used and
  2. what results where achieved.

In Section 1 we provide the toy data set for the different noise levels and the C. elegans and Drosophila melanogaster acceptor splice data sets. Larger versions of the result images for the toy data set which were also in the paper can be found in Section 2. We extended the experiments for C. elegans and Drosophila melanogaster to 100 bootstrap trials and also show the ROC Score achieved in each trial. These aditional figures can also be found in Section 2.

A downloadable version of the software will be made available soon.
  1. Download Datasets

    The datasets are in the format
    -1 TTCTGAAGAAGACGATGACGAAGACGAAGGAGAAGCCGTTGCAGAACTTGTCACAAAGTG
    -1 CCAACCTAATCGTTATACATATGTATTTACAGTCGCAAATGACAATTGAACAAATAAATG
        ....
    +1 AATGTTTCAATTATAAAAATTGTTAATTACAGGGGGACACCTGTATCAGTGTGACATTTC
        ....
    
    whereas the number -1 means randomly generated site (resp. no splice site) while +1 means site with custom motif (resp. splice site). Then after a space the sequence follows.
  2. Results for Weighted Degree Kernel

    Result files contain a line about the actual validation and test error followed by the actual classifier output.

    validation error=0.014181   test error=0.01214
    
    -12.143139
    -10.286769
    ...
    

    SVMs including kernel weights are saved in the following format:

    b=-3.577909
    alphas=[
    		 2 -1.000000
    		13 +0.373805
    		57 +1.000000
    		68 -0.332549
    		85 -1.000000
    			...
    ]
    betas=[ 
    		[+0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805+0.373805 +0.373805 +0.373805 +0.373805];
    		[+0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805+0.373805 +0.373805 +0.373805 +0.373805];
    		[+0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805 +0.373805+0.373805 +0.373805 +0.373805 +0.373805];
    			...
    ]
    

    Weights obtained in bootstrap trials are saved as shown here:

    betas of trial 001 = [
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +1.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 
    ]
    betas of trial 002 = [
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +1.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    	+0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000 +0.000000
    ]
    			...
    

$Id: index.html 2131 2005-08-30 15:44:24Z neuro_www $