Entering edit mode
6.5 years ago
tomaz-biostars
•
0
Hi,
I have 100.000 16S rRNA sequences (about 1500 bp) in one fasta file. About half are written as forward, and the rest are written in the reverse strand. What is the fastest way to find all the complements and reverse them, so that the single fasta file will have all the sequences written as forward.
Thank you for all the suggestions. Tomaž
Blast them all against a reference sequence that is in 5'-3', use
Hello Tomaž,
are there any information in the header of each sequence whether this is a reverse or forward sequence? Is each sequence in a single line or is it a multi-line fasta? Are the forward and reverse sequences paires or are they completly independend?
It would help if you could show as an extract of your file.
fin swimmer
Hi, The header does not give any information. Each sequence is independent and these are not reverse pairs. The fasta is in in single line. Here is the extract:
Hi,
thanks for answering my questions. It is better to format file contents by using the code button (the one 101 010). I did it for you now.
So, if there are no informations in the header about the strand and the reads are all independent: How do you know that have of them are reverse? I guess some kind of mapping is needed. But this leads us to the most important question: What are you trying do to?
fin swimmer
Blasting against 16S database shows me I have sequences in both directions. I want to reverse complement half of them so that all of them are in the same forward direction, which I need for downstream processing.
well, if you have the blast result already, then select the IDs for which end < start (== the complement hits), extract those from your orginal fasta and then pass those through a rev-comp tool (EMBOSS? linux/bash? ....)
Can I also advise to re-name your post title?
It's a bit misleading and you might miss out on people who could contribute if the title would be more accurate