Entering edit mode
5.9 years ago
pg_canada
•
0
Hi everyone,
I'm new to this community and new to this type of analysis, so I apologize if this question seems simple.
I'm formatting some bam files in order to run them through EXCAVATOR for CNV analysis. There are a couple of bam file where I get the following error:
[E::sam_parse1] missing SAM header
[W::sam_read1] parse error at line 7
[main_samview] truncated file.
When I check the bam (samtools view -h my bam) the file does seem to have a header, as below..
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:/mnt/
@SQ SN:chr2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:/mnt/
@SQ SN:chr3 LN:198022430 M5:641e4338fa8d52a5b781bd2a2c08d3c3 UR:/mnt/
@SQ SN:chr4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:/mnt/
@SQ SN:chr5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:/mnt/
@SQ SN:chr6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:/mnt/
@SQ SN:chr7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:/mnt/
@SQ SN:chrX LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540dd UR:/mnt/
@SQ SN:chr8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:/mnt/
@SQ SN:chr9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:/mnt/
@SQ SN:chr10 LN:135534747 M5:988c28e000e84c26d552359af1ea2e1d
@SQ SN:chr11 LN:135006516 M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ SN:chr12 LN:133851895 M5:51851ac0e1a115847ad36449b0015864
@SQ SN:chr13 LN:115169878 M5:283f8d7892baa81b510a015719ca7b0b
@SQ SN:chr14 LN:107349540 M5:98f3cae32b2a2e9524bc19813927542e
etc ..
Has anyone encountered this before? Any pointers as to how I can fix this?
This is the command I'm using to reformat:
samtools view -h mybam.bam | awk 'BEGIN{FS=OFS="\t"} (/^@/ && !/@SQ/){print $0} $2~/^SN:[1-9]|^SN:X|^SN:Y|^SN:MT/{print $0} $3~/^[1-9]|X|Y|MT/{$3="chr"$3; print $0} ' | sed 's/SN:/SN:chr/g' | sed 's/chrMT/chrM/g' | samtools view -bS -> mybam_merge_reformat.bam
Thank you
what is the output of your pipeline BEFORE the last samtools view
PS: this awk might not work. You're going to add some chr to the unmapped reads, you're ignoring the mate and the 'SA' tag for supplementary alignments.