Question

Contig labels in BAM off by 1, how do I fix it?

0

Entering edit mode

16 months ago

sarahgzb ▴ 40

After alot of hairpulling over why strelka wouldn't run on my bam files, I found that for 18 contig labels were wrong, and they're off by one (picture below). I've figured out which labels should be changed and to what, but I have no idea how to get there. I tried using sed, first changing the labels in question to a unique name (since if i just use sed normally if would keep replacing the label i just changed), but i get the error: failed to read the header for '-'. and the resulting inter.bam is empty.

 samtools view -H ${names[${SLURM_ARRAY_TASK_ID}]}_final.bam | sed -e 's/SN:chrUn_KI270741v1/SN:bebop1/' | sed -e 's/SN:chrUn_KI270742v1/SN:bebop2/' | \ sed -e 's/SN:chrUn_KI270743v1/SN:bebop3/' | sed -e 's/SN:chrUn_KI270744v1/SN:bebop4/' | \ sed -e 's/SN:chrUn_KI270745v1/SN:bebop5/' | sed -e 's/SN:chrUn_KI270746v1/SN:bebop6/' | \ sed -e 's/SN:chrUn_KI270747v1/SN:bebop7/' | sed -e 's/SN:chrUn_KI270748v1/SN:bebop8/' |  \sed -e 's/SN:chrUn_KI270749v1/SN:bebop9/' | sed -e 's/SN:chrUn_KI270750v1/SN:bebop10/' | \ sed -e 's/SN:chrUn_KI270751v1/SN:bebop11/' | sed -e 's/SN:chrUn_KI270752v1/SN:bebop12/' | \ sed -e 's/SN:chrUn_KI270753v1/SN:bebop13/' | sed -e 's/SN:chrUn_KI270754v1/SN:bebop14/' | \ sed -e 's/SN:chrUn_KI270755v1/SN:bebop15/' | sed -e 's/SN:chrUn_KI270756v1/SN:bebop16/' | \ sed -e 's/SN:chrUn_KI270757v1/SN:bebop17/' | sed -e 's/SN:chrY_KI270740v1_random/SN:bebop18/'| \ samtools reheader - ${names[${SLURM_ARRAY_TASK_ID}]}_final.bam > ${names[${SLURM_ARRAY_TASK_ID}]}_inter.bam

I've also just tried plugging in a single file (instead of the slurm array), and i get [mii] aborted by user!

Any ideas on how to solve this? also apologies for the long line of code, for some reason I can't get it to be as a block!

enter image description here

samtools bam • 1.3k views

ADD COMMENT • link 16 months ago by sarahgzb ▴ 40

2

Entering edit mode

I have no idea what this is all about but my recommendation is to rerun alignment if BAMs are odd. Don't do custom manipulation. bwa mem, duplicate marking, sort, then strelka. No custom things should be needed.

ADD REPLY • link 16 months ago by ATpoint 85k

1

Entering edit mode

I would point out that it seems you are trying to get around a much bigger problem...

why do you have a fasta with incorrect chromosome names? and how did it happen?

In general, little to no modification should be done to them after being downloaded from UCSC/NCBI etc...

You are now taking your bam file and applying incorrect chromosome names (i.e. they differ from the TRUE reference - they seem to be correct in your BAM and wrong in your FASTA). Although this is specifically in non-standard chromosomes (and likely outside of what your study is focused on) if you published results with these swaps this could be grounds for a future correction/retraction of the paper(s)...

ADD REPLY • link 16 months ago by benformatics 4.0k

1

Entering edit mode

that's a very good question, I was just given these BAMs (ie I didn't do upstream processing). The fasta has the correct chromosome names, I redownloaded it just to be sure, but I will look into seeing why the contig labels are off

ADD REPLY • link 16 months ago by sarahgzb ▴ 40

0

Entering edit mode

https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chrUn_KI270741v1

shows a length of 157,432

and https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chrUN_ki270757v1

shows a length of 71,251

-

so it looks like the fasta is incorrect. if i'm reading your table correctly and the first column is the fasta chr name and the second column the fasta chr length

EDIT: also confirmed on a copy of the UCSC genome i have locally

ADD REPLY • link 16 months ago by benformatics 4.0k

0

Entering edit mode

oh my goodness thank you so much for catching that, you're right, i must've done something to mess up the original fasta! I've corrected it and now everything is running smoothly, thank you again for your help, it's very much appreciated and saved me alot of time! :)

ADD REPLY • link 16 months ago by sarahgzb ▴ 40

score 2 · Answer 1 · 2023-07-07

Try doing it in two steps instead of a single pipe. Samtools reheader may not read the header file from stdin (samtools stdin/stdout is confusing).

In addition, I would have edited this in a text editor instead of this sed command. If you need to do this processing in programmatic way (e.g., as part of a pipeline) I would strongly advise you fix this problem at its root.

samtools view -H ${names[${SLURM_ARRAY_TASK_ID}]}_final.bam | 
sed -e 's/SN:chrUn_KI270741v1/SN:bebop1/' |
sed -e 's/SN:chrUn_KI270742v1/SN:bebop2/' |
sed -e 's/SN:chrUn_KI270743v1/SN:bebop3/' |
sed -e 's/SN:chrUn_KI270744v1/SN:bebop4/' |
sed -e 's/SN:chrUn_KI270745v1/SN:bebop5/' |
sed -e 's/SN:chrUn_KI270746v1/SN:bebop6/' |
sed -e 's/SN:chrUn_KI270747v1/SN:bebop7/' |
sed -e 's/SN:chrUn_KI270748v1/SN:bebop8/' | 
sed -e 's/SN:chrUn_KI270749v1/SN:bebop9/' |
sed -e 's/SN:chrUn_KI270750v1/SN:bebop10/' |
sed -e 's/SN:chrUn_KI270751v1/SN:bebop11/' |
sed -e 's/SN:chrUn_KI270752v1/SN:bebop12/' |
sed -e 's/SN:chrUn_KI270753v1/SN:bebop13/' |
sed -e 's/SN:chrUn_KI270754v1/SN:bebop14/' |
sed -e 's/SN:chrUn_KI270755v1/SN:bebop15/' |
sed -e 's/SN:chrUn_KI270756v1/SN:bebop16/' |
sed -e 's/SN:chrUn_KI270757v1/SN:bebop17/' |
sed -e 's/SN:chrY_KI270740v1_random/SN:bebop18/' > in.header.bam

samtools reheader in.header.bam ${names[${SLURM_ARRAY_TASK_ID}]}_final.bam > ${names[${SLURM_ARRAY_TASK_ID}]}_inter.bam