Hello,
I know there are already multiple threads on this issue and I have gone through most of them but I am new to bioinformatics and need some extra help.
I have a paired end bam file in which the chromosome names are in the format X,Y,1,2,3 ... and I need to convert them into the format chr1, chr2 etc. Based on some previous answers I tried the following code but the bam file produced is giving me multiple errors in downstream analysis so there is probably some mistake in this code:
samtools view -h file.bam |\
sed -e '/^@SQ/s/SN\:/SN\:chr/' -e '/^[^@]/s/\t/\tchr/2'|\
awk -F ' ' '$7=($7=="=" || $7=="*"?$7:sprintf("chr%s",$7))' |\
tr " " "\t"
I have also read that we can use picard tools or samtools to change the header but I am not sure what code to use with the commands for changing the header.
If anyone can guide me on how to do this with some more explanation of what the code is actually doing, I would really appreciate it.
Thanks
Edit: I also tried the code gvien here: https://josephcckuo.wordpress.com/2016/11/17/modify-chromosome-notation-in-bam-file/ but it keeps giving me the error: [W::sam_parse1] urecognized mate reference name; treated as unmapped
Thanks got it. Will try that.
I have edited the header file to include the "chr" prefix. When I use the reheader command to change the bam file I am getting this error:
Malformed key:value pair at line 1: "@HD VN:1.5 SO:coordinate
This is the first line of the header which I didn't change at all. Any ideas what might be going on?
Thanks
Did you edit the file on a OS other than unix and then move it back to unix? If so, you may need to fix the line endings by doing
dos2unix header.sam
.Yes I had edited it in windows but dos2unix still gives the same error. I will try editing it in unix.
Wasn't able to reply because of posting limits for new users but editing the header manually in unix worked perfectly. Thanks!