Entering edit mode
4.7 years ago
zebratown
•
0
Hello,
I have aligned some reads to a genome assembly. I'd like to find out which contigs of the genome assembly has alignments. How can I get the contig name of the contigs that have alignment?
Thank you!
Does your alignment program not show the name of the reference (in this case the contig)?
I aligned them with BWA. I know the reads only aligned to couple of the contigs I just want to know those so I can remove them.
If you have a SAM alignment file then look in column 3 to get the name of the contigs you are looking for. You could cut the 3rd column and then
sort|uniq
to get the names you need.I think I was not clear enough in my description, maybe this will be better.
I want to remove organelle sequences from my assembly, I downloaded fasta sequences of chloroplast and mitochondria and aligned these with BWA MEM to the assembly that I have. I want to identify the scaffolds containing organelle sequences so I can remove them from the assembly. Are there any strategies you can suggest?
Once you identify the names you want to remove use
faSomeRecords
utility (linux version) from Jim Kent at UCSC to remove those contigs from your original file. Add execute permissions after you downloadchmod +x faSomeRecords
. You will want to use--exclude
option below.Thank you very much for this information. I'll definitely use this while I'm removing the sequences.
Although, I am having trouble identifying the ones I want to remove. I'm not able to find where my sequences are aligned to so I can proceed with removing those.
Can you see the contig names you are interested in if you do this:
Yes I can see the contigs. Thank you so much, I appreciate your time and patience with me.
Have a great day.