Finding contigs with alignment
0
1
Entering edit mode
4.7 years ago
zebratown • 0

Hello,

I have aligned some reads to a genome assembly. I'd like to find out which contigs of the genome assembly has alignments. How can I get the contig name of the contigs that have alignment?

Thank you!

alignment • 1.2k views
ADD COMMENT
0
Entering edit mode

Does your alignment program not show the name of the reference (in this case the contig)?

ADD REPLY
0
Entering edit mode

I aligned them with BWA. I know the reads only aligned to couple of the contigs I just want to know those so I can remove them.

ADD REPLY
1
Entering edit mode

If you have a SAM alignment file then look in column 3 to get the name of the contigs you are looking for. You could cut the 3rd column and then sort|uniq to get the names you need.

ADD REPLY
0
Entering edit mode

I think I was not clear enough in my description, maybe this will be better.

I want to remove organelle sequences from my assembly, I downloaded fasta sequences of chloroplast and mitochondria and aligned these with BWA MEM to the assembly that I have. I want to identify the scaffolds containing organelle sequences so I can remove them from the assembly. Are there any strategies you can suggest?

ADD REPLY
1
Entering edit mode

Once you identify the names you want to remove use faSomeRecords utility (linux version) from Jim Kent at UCSC to remove those contigs from your original file. Add execute permissions after you download chmod +x faSomeRecords. You will want to use --excludeoption below.

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD REPLY
0
Entering edit mode

Thank you very much for this information. I'll definitely use this while I'm removing the sequences.

Although, I am having trouble identifying the ones I want to remove. I'm not able to find where my sequences are aligned to so I can proceed with removing those.

ADD REPLY
1
Entering edit mode

Can you see the contig names you are interested in if you do this:

awk -F '\t' '{if (!/@SQ/ && !/SO/ && !/PN/){print $3}}' your_file.sam | sort | uniq
ADD REPLY
0
Entering edit mode

Yes I can see the contigs. Thank you so much, I appreciate your time and patience with me.

Have a great day.

ADD REPLY

Login before adding your answer.

Traffic: 2015 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6