Hello,
I need to re-format a bam file, specifically to change the chromosome notation from gene__chr[1-9XYM] to chr[1-9XYM]. What I have is the following:
@HD VN:1.6 SO:coordinate
@SQ SN:A1BG-AS1__chr19 LN:2134
and
NB500901:267:HH2TMBGXG:1:21311:5296:17191:CAAGGTGT:TGCGCC:146 16 A1BG-AS1__chr19 3 9 22S31M7S * 0 0 TTTTGTTTTGTTTTGTTTTGTTTTTTTAGTAGAGACGGGGTTTCGTCATGTTGCTCAGGC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAEEEEEEEEEEEEAAAAAA NM:i:0 MD:Z:31 AS:i:31 XS:i:28 NH:i:1 CB:Z:CAAGGTGT UB:Z:TGCGCC
My desired output is something like:
@HD VN:1.6 SO:coordinate
@SQ SN:chr19 LN:2134
and
NB500901:267:HH2TMBGXG:1:21311:5296:17191:CAAGGTGT:TGCGCC:146 16 chr19 3 9 22S31M7S * 0 0 TTTTGTTTTGTTTTGTTTTGTTTTTTTAGTAGAGACGGGGTTTCGTCATGTTGCTCAGGC EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEAAEEEEEEEEEEEEAAAAAA NM:i:0 MD:Z:31 AS:i:31 XS:i:28 NH:i:1 CB:Z:CAAGGTGT UB:Z:TGCGCC
I've tried to do it with a combination of samtools and awk, like this:
samtools view -h input.bam | awk '{split($3,tmp,"__");if($0 ~ /^@/){$2="\tSN:"tmp[2]; print $0}else{$3="\t"tmp[2]}; print $0}}' | samtools view -Shb - > output.bam
However, I continue to get errors. I would highly appreciate any comment/suggestion!
Thanks a lot :)
Bam File: Change Chromosome Notation
BAM File, Change chromosome notation