get identical number of read1 and read2 aligned
1
0
Entering edit mode
6.5 years ago
oghzzang ▴ 50

Dear Biostar users.
This is my samtools flagstat output.

39219750 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
39219750 + 0 mapped (100.00% : N/A)
39219750 + 0 paired in sequencing
19658242 + 0 read1
19561508 + 0 read2
39219750 + 0 properly paired (100.00% : N/A)
39219750 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
1 + 0 with mate mapped to a different chr
1 + 0 with mate mapped to a different chr (mapQ>=5)

In this situation, why are not read1 and read2 read identical?

ADD) I want to get paired read and properly mapped read. so I run this command in linux

samtools view -b -f 3 -F 780 input.bam > properly_ex_qual_unmap_1_mapped.bam and flagstat result (above) is from this bam.

I can't understand this result. Because in flagstat result all reads were properly mapped and all reads were mapped.

Thanks for your help.

dna fastq • 3.8k views
ADD COMMENT
0
Entering edit mode

Thanks for Franco, Yague, Ryan.

1st. According to Yague and Franco advice, I run samtools view and flagstat

1) samtools view -b -f 2 -F 12 input.bam > f2F12.bam

39219750 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

39219750 + 0 mapped (100.00% : N/A)

39219750 + 0 paired in sequencing

19658242 + 0 read1

19561508 + 0 read2

39219750 + 0 properly paired (100.00% : N/A)

39219750 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)

1 + 0 with mate mapped to a different chr

1 + 0 with mate mapped to a different chr (mapQ>=5)

2) samtools view -b -F 12 input.bam > F12.bam

40294316 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

40294316 + 0 mapped (100.00% : N/A)

39486133 + 0 paired in sequencing

19793464 + 0 read1

19692669 + 0 read2

39219750 + 0 properly paired (99.33% : N/A)

39486133 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)

219437 + 0 with mate mapped to a different chr

219437 + 0 with mate mapped to a different chr (mapQ>=5)

this result and my 1st question's output are same. The reason why my 1st question's output is from bam (samtools view -b -f 3 -F 780 input.bam > properly_ex_qual_unmap_1_mapped.bam ).

2nd. According to Ryan's advice, I checked my raw fastq files. But this bam file is too old, so raw fastq file was deleted.

3rd. In Ryan's post (link: https://www.biostars.org/p/233441/#233455), different number of # read1 and # read2 is related to singletons. But my flagstat result seems like "no singleton".

I need only paired reads ( identical number of read1 and read2 ). But I can't find method.

Your all answer is helped me. Thanks.

ADD REPLY
0
Entering edit mode

Hello.

I think I found a problem.

In f2F12.bam file ( this bam file was made of "samtools view -f 2 -F 12 input.bam > f2F12.bam") [samtools -f 2 -F 2 input.bam > f2F2.bam --> mistake],

a read's flag value is 99.

According to https://broadinstitute.github.io/picard/explain-flags.html, 99 mean follwing 4things.

read paired (0x1)
read mapped in proper pair (0x2)
mate reverse strand (0x20)
first in pair (0x40)

But in f2F12.bam file, the read have no pair reads......

I think other reads are same reason.

Can I get only exactly and truly paired reads using program or command? delete unmatched read?

I think my bam file is too bad.

Thanks

ADD REPLY
1
Entering edit mode

Did you use bwa mem? If so, the difference is due to supplemental alignments.

ADD REPLY
0
Entering edit mode

Thank you Ryan.

I checked my bamfile Header. This bam file is maybe made of bwa aln and samse.

I can't understand this situation...

Because in the flagstat output, they are all properly paired and all mapped and all paired in sequencing . :(

Then, how can I extract exactly identical read1 and read2?

Thank you for your reply.

ADD REPLY
1
Entering edit mode

what do you mean by identical reads?

ADD REPLY
0
Entering edit mode

I'm sorry for my confused word.

I mean

I want to get only paired reads that have identical header name.

If so, the number of read1 and read2 will be identical.

Isn't it?

Thanks

ADD REPLY
0
Entering edit mode

Go to this page Meaning of sam flags Every mapped read containing the 2 value in their flag, is a read mapped in a proper pair. So by using

samtools view -f 2 -b your.bam > proper_pairs.bam

you theoretically can get your properly mapped reads

ADD REPLY
2
Entering edit mode

alternatively, to extract the mapped reads whose mates are also mapped, u can use

samtools view -F 12
ADD REPLY
1
Entering edit mode

Check the number of lines in the two original fastq files you had. Perhaps one has a few extra reads.

ADD REPLY
0
Entering edit mode

Did you actually do -f 2 -F 2 or did you mean -f 2 -F 12? -f 2 and -F 2 are opposites, so it's unclear what the results of that would be.

ADD REPLY
0
Entering edit mode

oh I'm sorry for my mistake.

I do "samtools view -f 2 -F 12".

ADD REPLY
0
Entering edit mode

You might need to just post the BAM file somewhere. Then we can have a look at what's going on. My guess is that this file was filtered at some point, which is why there's a discrepancy in the mates.

ADD REPLY
2
Entering edit mode

You can also try to use samtools fixmate to update mate-related flags before running flagstats.

ADD REPLY
1
Entering edit mode
6.5 years ago

Simply because some reads did not mapped under the conditions indicated by the program

ADD COMMENT
0
Entering edit mode

Thank you Franco.

If so, can I extract identical read1(forward) and read2(reverse) from this bam?

Thank you for your reply.

ADD REPLY

Login before adding your answer.

Traffic: 1883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6