Regarding filter primary alignment
1
0
Entering edit mode
7.4 years ago
DL ▴ 50

Hi, Can anyone tell me that how to filter primary alignment from the bam file. i have paired end data and mapped the data on genome using bwa. Now, i want to filter reads that have primary alignment from bam file because i want to identify SNPs and its important that i should remove secondary align reads from the bam file. i searched for the answer but do not get satisfactory answer. So please anyone can tell me that how to get reads from bam.

Thanks & Regards Deepika

genome next-gen sequencing SNP • 15k views
ADD COMMENT
0
Entering edit mode

Now, i want to filter reads that have primary alignment from bam file because i want to identify SNPs and its important that i should remove secondary align reads from the bam file

As far as I know variant callers will take this into account so you don't have to worry about this.

ADD REPLY
0
Entering edit mode

Thank you for your response. Would you please tell me which software do you use for variant calling. ??

ADD REPLY
0
Entering edit mode

We use GATK and samtools.

ADD REPLY
0
Entering edit mode

Thank you !!!

In samtools, people used this command : samtools faidx NC_012967.1.fasta

samtools view -b -S -o SRR030257.bam SRR030257.sam

samtools sort SRR030257.bam -o SRR030257.sorted.bam

samtools index SRR030257.sorted.bam

samtools mpileup -u -f NC_012967.1.fasta SRR030257.sorted.bam > SRR030257.bcf

bcftools call -v -c SRR030257.bcf > SRR030257.vcf

but i have question related to this, you said variant caller will take this into account. So, should i use sam file as its as that i got from mapping. i should not filter those reads that map multiple positions on genome. i mean to say that pick the best aligned reads from sam/bam file. i am so confused because i read so many papers and search for answer in forum but still do not get clear vision on that. If you give me advise something then i will be grateful to you.

Thanks

ADD REPLY
0
Entering edit mode

I suggest you to follow the best practices from gatk.

ADD REPLY
1
Entering edit mode
7.4 years ago

use samtools view and the flag 256 (not primary alignment) to extract primary/secondary alignment :

samtools view -F 256 input.bam # only primary alignments
samtools view -f 256 input.bam  # only secondary alignments
ADD COMMENT
1
Entering edit mode

Thank you so much for your quick response. I ran the both command. In the the second command, i did not get any output is it possible ??

I want to ask a question related to first command. it extract best reads from bam files on the basis of position. Suppose several reads mapped to one location so it picked best one or one read mapped to several location so it picked the best position. I am new one in this kind of work so hope you do not mind if i ask some questions to you. i pasted some part of output and you can see first read mapped to different location of same chromosome.

@PG     ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -t 20 rose-genome.fasta /work/jclotault/cutadapt/DE-R1-filter.fastq.gz /work/jclotault/cutadapt/DE-R2-filter.fastq.gz
H1:1:H5GCTBCXY:1:1107:1178:2120 83      Chr05   1282000 40      151M    =       1281229 -922    GAGTAACTTCATCAAATCTTACATCCCTGGAGACAATGAGTTTCCTAGTAGCAAGATTGTAGCATTTATACCCTTTTTGAGTAGATGAAT
ATCCTAGGAAAACACATTTATCAGCTCTAGGATCAAGTTTATCACGTTGATGAGCTTGAAT   FGFEHIHHIHIIIHIFHHHHH@@1EFIIHHGHIIHHHIIIHHHEHHHGC<EIHEF?HGFFCEEIIIIHHFIIIHGFHHHHIIIHHIHIHIIIIHEHFIIHGCD1EHEHHIGHHIIHGIIHGH
FHECEHDIHHHEIIIHIIIHHEEH@@BAD   NM:i:1  MD:Z:58T92      AS:i:146        XS:i:146        XA:Z:Chr00,-20998106,151M,1;Chr05,-82297399,151M,1;Chr02,-68189779,151M,2;Chr07,+53332990,151M,2;
H1:1:H5GCTBCXY:1:1107:1178:2120 163     Chr05   1281229 60      151M    =       1282000 922     GGTAAAGAGGCCATGCATGATTAACTGCAACTGATAACAAAACTCTAACTGTGTTCATTTTAGCAACAGGAGCAAAAGTTTCCTTGTAAT
CTACCCCATAAGTTTGAGTAAAACCACGAGCCACTAAACGAGCTTTATGTCTCTCCACAGT   <<DD0F<11CEHHICHHHH@HHHIGHH1GEGHIIIIIIIGIIHF@HHIHHHIIHIIHHIIIHIGFHHIIGHHIHHGHHHHHIIHHGHHHHEHIHEHHGHIID<FH?HHCFHHHHCHEHCHHI
HHIIHIEDCCDHGEEH@FE?CFHHHHCHG   NM:i:1  MD:Z:68T82      AS:i:146        XS:i:128        XA:Z:Chr02,+68189035,151M,5;Chr05,+82296655,151M,5;Chr00,+20997367,5S143M3S,5;
H1:1:H5GCTBCXY:1:1107:1377:2167 83      Chr07   5387480 13      151M    =       5386363 -1268   GGGTTGTGGGTGGCGGGGCTTGAGTGACAGGTCGGTGAGGCGAGAGAGAGTGAAAGCCCGAGCTGCGGCCGCCATGGAACAACCACGAGA
GACTTGTGGCCTCGTGTGGTGCCTCTCGAATCTTGGGGGCTCGATTCTTGAGCTTCGGCTT   .HEA.E=EHHIHHDEEGHE@FC0FHHGCCIHCIHHEH=IHIHGCHFIHGGHHHH<HCCHHCCHHIHHHIIIHHIIHHHHIIHEHHHE@IHGIIIGIIIIIHIIIHHHHHIIIHIIIIIIHII
HIIIHIHIIIIHGHIIIIIHIHGIDDADD   NM:i:2  MD:Z:0T140A9    AS:i:145        XS:i:145
H1:1:H5GCTBCXY:1:1107:1377:2167 163     Chr07   5386363 13      151M    =       5387480 1268    ACTCAATTAGAAGCTACTGGATAAAGTTCATAACTTTGAAACTGGGGATGGACATATCACAAGTTCGGCACCAATACTAAATTATACAAG
TGCAATGTGTGGTAGTCCAAGATCAGGGGTTTACATAAGGATGCCACTAAAAAGTAGCAGT   D?D@DHHI@H@HE?GHHIHFHIIFF1D1CGHHIIIIFE@HHICGHHGHHDCECHIIHIGHHI?HECH/CCCCE@1F?@F@C1<CGEHIIEG@1CCHHHCHED@D@DH1CHHG@G@ED1C0/0
<<1<CF@@G@?E1D1D@GEHI0DC11F1<   NM:i:1  MD:Z:149C1      AS:i:149        XS:i:146
H1:1:H5GCTBCXY:1:1107:1504:2099 83      Chr00   13715595        0       151M    =       13714992        -754    CTTTCCTTGTTTAGCTCGGTTTCTCCTAGCCGAAGTGAGAAAACGTGCTAAAGTTGACTTTTCATTTCCATGCT
TCCATCATTTCCTTTTCTTACAGCTCGCATAATTCTCCATAGTAGGCTTTATTTAGCCTCTAAATAATATATTTCGA   EIHHIHEHIIIHHDIHIHHHIIIIHIIIIIIIIHIIIIIIHIHIIHGHHDHIIIHIHIIIIIIIIIIHIIHIIHIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIII
IIIHIIIIIIIIIIIIIIIIIHIIHHHHIIIIIIIIIIIIDBDDD   NM:i:0  MD:Z:151        AS:i:151        XS:i:151
H1:1:H5GCTBCXY:1:1107:1504:2099 163     Chr00   13714992        0       151M    =       13715595        754     AGGTGTTTATATAGGGAAGAAAAAGAAGAGTGAAATGATGAGTGGAAGAAAAATAATGAAAGTAGATCTAAGTT
CACTTGTAAAATATGGAAAAGATAGAGAAAAGATGAAATGAAAGCAAAGCATGAAGGTGCAGCAACATGGAAGTGGT   DB@<<<C<F@G<CE1GHH<DCH@DC1D<D1DCF1DH@GG@HHCGH@GCGECGHHHHIIIIHEHHC@<CFCCHGHFHEH@G@FGEHGHEE@GFHGHHHE?HI@F1@F
GHIIEECF?@CHCGHDGFHIIHHIIIIIIIIHIHHHHH?GHH?H<   NM:i:0  MD:Z:151        AS:i:151        XS:i:151
H1:1:H5GCTBCXY:1:1107:1662:2155 83      Chr07   12664169        0       151M    =       12663271        -1049   TTAAATCAGATGGATAGTTTTTGAGCTAATGCTTTACATAAAGCGGTTTTTCTAGTCCCTGGAGTTCCATGTAA
AAGAACAATGCTGCAAAGTCATAAAACAAATCACATAGGTAATTAGAATTTCTTAGCAAAAAAAGATGAGAAATGAA   IHIIIIIIHHCHHECEHHEEHHHIHGHIHHFF1HEHF@CECHCC<0GC<CHHG@E<0C1IIHEHF@C1HC@HHEHHGFCHEHF@@HEHHHC?EHIHIIHIHCEHHI
IHHGG11HF@CEFC@HHEHEEHCF/HHHHFHHIHHFCEHF@0@D0   NM:i:0  MD:Z:151        AS:i:151        XS:i:151
H1:1:H5GCTBCXY:1:1107:1662:2155 163     Chr07   12663271        0       151M    =       12664169        1049    CATCTGTCGGCCTCATTCACCACTTTATACAATTTGGTGAATGTAACCAAAGGTGAGGGACAACACAGTATTGA
TGGAGGTCTTCCAAAGCAAATAAAGGATAAGAAATAACATGAATTATAGCTTGACAATGCCATCCTCAACTCTCAGT   0DD0@C<C@CCC@CCFHHIHC@1C1F?E<1<<<FHHHIIHH1FCGHE1CC?<<<1<?<E@@EHH1EHC<CC1<D1D1C1<01DG1D1<<<11CEE?CGE1CCGF1D
CC1<D1<111C1<D1<DGG?1C1<11111<<1<1<FHCF1DG@H<   NM:i:4  MD:Z:65T23C5G47A7       AS:i:131        XS:i:131
ADD REPLY
1
Entering edit mode

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Thank you so much for your quick response. I ran the both command. In the the second command, i did not get any output is it possible ??

Yes it is possible to not have any secondary alignment. You can check the number of secondary alignments with :

samtools flagstat input.bam

one read mapped to several location so it picked the best position.

this one is correct

ADD REPLY
0
Entering edit mode

Hi,

I think the samtools view -F 256 input.bam command only excludes the secondary alignments for some reason, so the output would be primary+supplementary (the number is different from the samtools flagstat result).

And after checking the samtools flagstat documentation, I noticed that we need to do something like this samtools view -b -F 0x800 -F 0x100 output.bam > primary_alignment.bam

Namely, we need to exclude the secondary and the supplementary alignments at the same time!

Hope this helps!

ADD REPLY
0
Entering edit mode

nice. for the purposes of snp-calling, does it make sense to exclude not-properly-paired reads as well (e.g. add -f 0x02)?

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6