I am trying to convert sam to bam using samtools view -bS IN.sam > OUT.bam
I get the following error:
[W::sam_read1] parse error at line 36
[main_samview] truncated file.
Line 36 is this:
=====> Processing read 'simulated.2618103'/1 <=====
There are no errors when BWA is running. I have read the sam file format specifications document and nothing is mentioned about lines starting with =====>
Here is from the beginning of the file to line 40:
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:gi|9627186|ref|NC_001539.1| LN:5323
@SQ SN:gi|9629818|ref|NC_001847.1| LN:135301
@SQ SN:gi|315192962|ref|NC_002306.3| LN:29355
@SQ SN:gi|21728357|ref|NC_004067.1| LN:6450
@SQ SN:gi|38018060|ref|NC_005148.1| LN:1768
@SQ SN:gi|352950882|ref|NC_011507.2| LN:3302
@SQ SN:gi|303291528|ref|NC_014406.1| LN:4926
@SQ SN:gi|311977355|ref|NC_014649.1| LN:1181549
@SQ SN:gi|448259945|ref|NC_019925.1| LN:152427
@PG ID:bwa PN:bwa VN:0.7.12-r1039 CL:bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001539_1.fq.gz seqtk_1/subsample_1/sub_NC_001539_2.fq.gz
=====> Processing read 'simulated.2618103'/1 <=====
* fraction of repetitive seeds: 0.000
* Found CHAIN(0): n=3; weight=81 20;20;0,3095694383(gi|9627186|ref|NC_001539.1|:+401) 29;29;0,3095694383(gi|9627186|ref|NC_001539.1|:+401) 52;52;30,3095694413(gi|9627186|ref|NC_001539.1|:+431)
* Found CHAIN(1): n=1; weight=19 19;19;76,1952705833(chr12:+1774857)
* Found CHAIN(2): n=1; weight=19 19;19;40,770774837(chr4:+80285843)
Lines 1-35 seem to fit with the sam file format specifications
Any help would be appreciated.
How did you run BWA? It seems stderr is making its way into your SAM file.
I agree, it looks as though you're redirecting both stdout and stderr to the same file.
That's what I was thinking but I can't seem to spot how
This is the code I used for BWA....
I may have made a really stupid mistake and just not seeing it
I don't see nothing wrong with your command-line, but I don't think using
-v4
is needed. Are you running BWA directly, or inside some script or pipeline? Is stderr being redirected somewhere? Is BWA running as a background process?How did you create the index? What is the size of your reference?
I guess you are running this command via nohup and thought this was not relevant?
You may try to align each read file (sub_NC_001539_1.fq.gz or sub_NC_001539_2.fq.gz, not both) and merge the two bam files using Picard.
BWA is well able to handle paired-end information. Splitting is not necessary and would result in a loss of the insert size information, which then would need to be included in a second step after the merge. If you handle big files, that will take ages.
While running the bwa mem, the value "4 (or in my case 4+)" for the argument "-v" is somehow incorporating stderr into the output sam file. Removing the -v argument or using values 1, 2 or 3 for -v worked for me while running bwa mem. In my case I had single-end fastq files (no paired-end reads). Output will be directly a sorted bam file; 10 threads used.
What is the point of this answer, it is identical to mine?
It simply pinpoints which argument was causing the error. Plus in my case it worked fine even without the -M argument.