Entering edit mode
7.0 years ago
Jautis
▴
580
Hi, I have a sam file from BSMap which I have modified into the format of a sam file from bismark.
However, after doing so, I'm no longer able to convert the sam file into a bam file using samtools. What is it that I'm missing? The more general version of this question would be when are you able to use samtools to convert sam-to-bam and what sam formats are acceptable
Thanks in advance!
Code
#reorder columns
awk '{print $1 "\t" $13 "\t" $3 "\t" $4 "\t" $5 "\t" $6 "\t" $7 "\t" $8 "\t" $9 "\t" $10 "\t" $11 "\t" $12}' SAM > SAM2
#reattach header (not-modified)
cat sam_head SAM2 > temp; mv temp SAM2
#attempt to convert file
samtools view -bS SAM2 > BAM
Error Message: (line 25 is the first read after the header)
[E::sam_parse1] unrecognized type
[W::sam_read1] parse error at line 25
[main_samview] truncated file.
Bam File, First 25 lines.
@HD VN:1.0
@SQ SN:chr4 LN:165299245
@SQ SN:chrX LN:143131424
@SQ SN:chr2 LN:187378091
@SQ SN:chr6 LN:174439528
@SQ SN:chr8 LN:139646187
@SQ SN:chr12 LN:104110932
@SQ SN:chr10 LN:90941950
@SQ SN:chr14 LN:123829720
@SQ SN:chr16 LN:74645514
@SQ SN:chr18 LN:72186199
@SQ SN:chr20 LN:71807805
@SQ SN:chr1 LN:220367699
@SQ SN:chr3 LN:180432695
@SQ SN:chr5 LN:178775436
@SQ SN:chr7 LN:162156779
@SQ SN:chr9 LN:125196307
@SQ SN:chr11 LN:132286798
@SQ SN:chr13 LN:128036923
@SQ SN:chr15 LN:107442819
@SQ SN:chr17 LN:90913898
@SQ SN:chr19 LN:51301725
@SQ SN:mtDNA LN:16566
@PG ID:BSMAP VN:2.90 CL:"bsmap -3 -n 1 -v 0.1 -r 0 -a ./tomap.fq.gz -d /file.sam"
1_7163:15-114 16 chr18 70657804 255 100M * 0 0 ATAAATTATTATATTAATGTAAAAGTAGTAAATATTTTTGTGGTGTAGTTTGCGTGTTTGGTTTTTTTTATTATTTATTTGTGAGACGTTGATTTTCGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:0 MD:Z:0G1G1G4A9G13G3G0G1A2G2G2G39G2 XM:Z:h.h.h..............x....H........x...xh....x.Zx..x........Z...............................x.. XR:Z:CT XG:Z:GA
Speaking of this problem in particular: we'd need a print out of line 25 to understand what's wrong with it.
More generally speaking:
A SAM file (to be called as such) requires a formatted header and a series of records which have the columns defined in the file format definiton (link). If you want to be able to convert a sam to a bam, you need your file to possess these two elements. It doesn't matter if the header contains more scaffolds than the ones represented in the records, what matters is that the opposite doesn't happen: records point at chromosomes / scaffolds that are not in the header. You'll see everything in chapter 1.3 of the linked PDF.
In your case I see you're attaching the header so: are you keeping all the header when you generate it? Are your record lines all containing the same number of fields?
Thanks! I went ahead and added the first 25 lines to the initial question. Yes, I am keeping the same header that I initially generated. I am adding additional fields (NM, MD, XM, XR, and XG flags with dummy values)
Does it print the same error if you exclude the XM tag field at the end? And if you change the read name? Maybe there are some meta-characters... Also, check if you have whitespaces!
Please post first few starting lines and end lines of your samfile so as to get clear idea of what has gone wrong.