Entering edit mode
9.7 years ago
Biomonika (Noolean)
3.2k
I have fasta file which has been soft-masked to lowercase. Now, I would like to get GFF (BED) file with coordinates of all soft-masked sequences. I was wondering if such a tool is already available. Thank you.
Thanks for an answer! I haven't used flex before, could you please explain in greater detail how to install/run this code? (flex: can't open jeter.l)
in the current context GNU flex runs like a awk for C.
for example
is a pattern for the a fasta header.
Perfect! Thank you very much. I will just add note for the others that code reports are 0-based.
sorry I renamed the file when copy+paste, that should be
flex code.l
notflex jeter.l
Hi,
I'm trying to do the same but while testing the above script, I figured that I'm not getting the coordinates of the softmasked region at the end of the sequence.
For example, testing the script on the following three cases,
I get back,
But how would I change the flex script to give back the coordinates of the sequence "adad"?
Thanks,
Cheers,
ah yes, I forgot the trailing lowercase letters. I've added a
<<EOF>>
condition.This is very useful, but how do I run it with a locally stored .fasta file? I tried the following and I got an empty co-ordinate file and it doesn't finish running either. Also, is there a way to use multiple threads to run this?
flex code.l && \
gcc lex.yy.c && \
./a.out genome_softmasked.fasta > co-ords.gff