Challenge Motif Search Problem

Challenge Problem: Find a signal in a sample of DNA sequences, each 600 nucleotides long and each containing an unknown signal (pattern) of length 15 with 4 mismatches.
For the description of the Challenge Motif Search Problem see http://www-ab.informatik.uni-tuebingen.de/teaching/ws04/seqana/downloads/pevzner00combinatorial.pdf

This implementation does an exhaustive search for all possible patterns in the input set. We restrict us to the more interesting situations where the signal is weak, so there are more than one possible patterns that score high. In these cases the patterns found are interrelated and the pattern that has the highest score is selected. It appears that the Challenge Motif Search Problem (15,4) is quite easily solved. Even the (15,5) and larger problems can be solved by this algorithm, although with a significant increase of CPU time.

home