Hi, How to adjust BAM header after extracting only mapped reads from BAM files to have only the reference ids which are present inside the BAM files?
Thank you in advance.
Michal
Hi, How to adjust BAM header after extracting only mapped reads from BAM files to have only the reference ids which are present inside the BAM files?
Thank you in advance.
Michal
Hello Michal,
There is no samtools command to do what you want to achieve.
So, first of all, "adjusting" the headers to only mapped reads is not something required if you want to process the BAM file. I mean, if you have reads mapped on chr1 and chr2, even if chr3 and chr4 still in the headers they will not bother you.
Nevermind, if you still wanting "adjusting" the headers, you will need to write a little script for this. To do so, several options:
samtools view -F 4 in.bam | cut -f 3 | sort | uniq
<- gives you the list of headers with at least 1 read mapped.
samtools view -H in.bam > adj.sam
<- writes headers in the sam format, you just have to removed bad refs.
samtools view -F 4 in.bam >> adj.sam
<- writes all the mapped reads in the sam format.
samtools view -h -b -S adj.sam > adj.bam
<- Get a bam.
This "hand-fix" solution is good if you only have 1 bam to process. Otherwise, I recommend writing a script / little program to do this in an automatized way. ;)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you mean by adjusting header? Do you mean adding a missing header?
removing the @SQ lines ? that's usually a bad idea
Yes, removing @SQ's which are not contained in the BAM file.
ok let's take a step back, why do you want to do this?
just curious, why would you want to do this ?
I need to find out which reference was hit i.e. reads have mapped.
I still don't get it. If you need to extract the reads , say on chr22, why don't you just use
I need only reference Ids from only mapped reads.