Entering edit mode
5.7 years ago
Sergio Martínez Cuesta
▴
230
Dear all,
Most libraries and software aimed at obtaining DNA sequence logos (e.g. ggseqlogo) or discovering sequence motifs (e.g. MEME tools) take as an input a fasta file containing a list of sequences:
>seq1
AGATCATCATCTCAT
>seq2
GTCTAGCTACGTACT
>seq3
TGCATGCATGCATCC
(in the case of motif finding, a list of negative sequences is often used as well)
However my list of sequences contain individual scores for each of my input sequences:
>seq1 53.4
AGATCATCATCTCAT
>seq2 21.5
GTCTAGCTACGTACT
>seq3 11.8
TGCATGCATGCATCC
I was wondering if anyone is aware of any tools that would take into account the sequence scores (53.4, 21.5, 11.8) to guide the creation of sequence logos or discovery of motifs.
Any hints would be quite useful.
Maybe to duplicate the sequences based on the weight as the input?
That could work! But when adding sequences I would have to round decimal numbers to integers, which could result in a huge number of sequences after all, however this may not be a problem here.
Have you tried this? http://fraenkel-nsf.csbi.mit.edu/webmotifs-tryit.html https://academic.oup.com/nar/article/35/suppl_2/W217/2923614
Neither of the linked tools does. Therefore moved to a comment. It is appreciated that you aim to provide help but if you simply and only link content that matches the topic of the top-level question rather than answering what OP asked for, it simply does not help. Please stop doing that.
Thank you, I had a read through the docs. Even though you can input what they call seeds, I could not find a way to incorporate sequence scores into the motif discovery.