Trouble with samtools view not filtering specified regions of bam file
1
0
Entering edit mode
10 months ago

Hi there, I am trying to run an analysis on a group of bam files but first I want to remove the non-chromosomal regions (including MT/KN/KZ regions) and keep info for chr 1-25 (zebrafish model). I am using samtools view after looking through many suggestions however, even when I run this command, I checked the header of the new filtered output bam and it still contains these regions that I am not interested in.

samtools view -@ 50 -b -o Ctrl-01_MD_filtered.bam Ctrl-01_MD.bam 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 3 4 5 6 7 8 9

Note: The reference I am using does not contain "chr" prefix. Also, I double checked that my bam files are sorted so I've ruled that out as a possible issue. Using Samtools v. 1.18

Has anyone encountered this issue before or been able to use successfully?

bam samtools • 739 views
ADD COMMENT
2
Entering edit mode

do a count of the alignments to verify that the a region is removed:

samtools view -c input.bam chr5

or do an idxstats to see counts to all chromosomes

samtools idxstats input.bam
ADD REPLY
3
Entering edit mode
10 months ago
ATpoint 85k

The header is based on the reference genome and then does not change after filtering. Chromosomes with no alignments are still in the header. It's normal and expected.

ADD COMMENT
0
Entering edit mode

ATpoint Istvan Albert Thanks for your responses! Is it possible to remove certain regions from the bam headers as well and/or in the reference fasta? The analysis software I am trying to run keeps stalling out on these regions assuming because I have removed the reads from those regions and I am hoping if I remove them entirely then the program will finish running.

ADD REPLY
1
Entering edit mode

I kind of doubt that not having reads in a region causes the problem - but of course, everything is possible - it is bioinformatics after all.

Removing the header lines can be done by turning the BAM file into SAM, removing the lines with some sort of text editing process - then turning the file back to BAM,

for example to remove the headers for chromosomes 4 and 6 you could do:

samtools view -H input.bam  | grep -v 'SN:4 | SN:6' > header.txt
(cat header.txt && samtools view input.bam) | samtools view -b > newfile.bam
ADD REPLY

Login before adding your answer.

Traffic: 1552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6