samtools -F 0x2 corrupts the file
1
0
Entering edit mode
8.6 years ago
aleka ▴ 110

Hello,

I have used the following command to extract the properly paired ends from a bam file: samtools view -F 0x2 file.bam > proper_paired.bam

but then when I used validateSamFile with picard to validate the bam file with the following command: java -jar /apps/picard-tools/2.1.0/picard.jar ValidateSamFile I=proper_paired.bam MODE=SUMMARY

I get the following errors: ERROR:MISSING_READ_GROUP 1 ERROR:MISSING_SEQUENCE_DICTIONARY 9671811 ERROR:READ_GROUP_NOT_FOUND 9878009 WARNING:RECORD_MISSING_READ_GROUP 9878009

The dictionary for the fasta file exists in my folder It seems that samtools view -F removes the sam header

I tried to add the header from file.bam with samtools reheader header.sam proper_paired.bam but I get the following error: Segmentation fault (core dumped) samtools reheader header.sam proper_paired.bam

The initial file.bam is just fine, double checked. I just need only the properly paired reads in the bam file.

Thank you.

next-gen alignment genome sequence • 3.7k views
ADD COMMENT
0
Entering edit mode

Thanks both. It have now the correct format. However, I was wondering you might be able to help me clarify something. the results of the samtools flagstat command on the original bam file is:

748518727 + 0 in total (QC-passed reads + QC-failed reads)
47793497 + 0 secondary
0 + 0 supplementary
163410882 + 0 duplicates
745241565 + 0 mapped (99.56% : N/A)
700725230 + 0 paired in sequencing
350362615 + 0 read1
350362615 + 0 read2
**673521832 + 0 properly paired (96.12% : N/A)**
695574366 + 0 with itself and mate mapped
1873702 + 0 singletons (0.27% : N/A)
13092690 + 0 with mate mapped to a different chr
10005816 + 0 with mate mapped to a different chr (mapQ>=5)

and when I run the samtools view -h -b -F 0x2 original_file.bam > proper_paired.bam Then the results of the samtools flagstat on the proper paired bam file is:

36070927 + 0 in total (QC-passed reads + QC-failed reads)
8867529 + 0 secondary
0 + 0 supplementary
4099574 + 0 duplicates
32793765 + 0 mapped (90.91% : N/A)
27203398 + 0 paired in sequencing
13601699 + 0 read1
13601699 + 0 read2
**0 + 0 properly paired (0.00% : N/A)**
22052534 + 0 with itself and mate mapped
1873702 + 0 singletons (6.89% : N/A)
13092690 + 0 with mate mapped to a different chr
10005816 + 0 with mate mapped to a different chr (mapQ>=5)

How is it possible after the filter 0x2 to have 0 properly paired reads? 0x2 is to keep only the properly paired reads in the bam file. Do I miss something?

ADD REPLY
2
Entering edit mode

little f is to keep, big F is to remove.

ADD REPLY
0
Entering edit mode

I also tried

samtools view -h -b -f 0x2 file.bam > new_file.bam

samtools view -h -b -F 0x4 -F 0x8 -F 0x400 -F 0x200 file.bam > new_file.bam
but the result that I get from flagstat is:
parse error at line 1
[bam_flagstat_core] Truncated file? Continue anyway.
0 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

so the -f or the multiple -F option corrupts the file. Let me know how I can get the only properly paired reads from a bam file if you know. Thank you.

ADD REPLY
0
Entering edit mode

Use the ADD COMMENT/ADD REPLY buttons on previous posts to add additional information like this. Don't add "New answers".

ADD REPLY
0
Entering edit mode

Hehehe, corrupts the file. I think "it corrupts the file!" will be my new go-to phrase for when people ask me a difficult question which has a long, complicated and boring answer.

You can't chain -F flags up like that. You need to add the 0x4 and x8 and 0x400 and 0x200 up, which as we all know is 0x60c. Obviously. You could also do -F 1548, but you can't use 11000001100, because that would be too easy.

You can have an -f and an -F at the same time though.

ADD REPLY
1
Entering edit mode

I can't think/calculate in hexadecimal, so no, I did not know it would be 0x60c. I am sure a lot of people on this forum may not either :-)

ADD REPLY
1
Entering edit mode

<sarcasm /> :P hehehe, sorry. I think making people use hex or even base 10 to talk/think about flags was a huge user interface mistake for samtools. Actually i've gone on record saying it a lot stronger than that, and in all of my tools I use letters instead of flags because honestly, summing numbers is what computers should do, not humans.

ADD REPLY
0
Entering edit mode

Many (including me) would totally agree with you.
Is that an irreversible decision? It would be a big help to have the ability of being able to specify letter flags.
Is samtools flags command only meant for translation of flags? If it does that then surely letter/word options can be easily enabled for -f and -F.

ADD REPLY
1
Entering edit mode
8.6 years ago

You forgot -h or -b. If you make a SAM file, you need to include the header (-h). Presumably you wanted a BAM file, though (-b).

ADD COMMENT

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6