Truncated sam file - Parse error
1
0
Entering edit mode
7.2 years ago

I am trying to convert sam to bam using samtools view -bS IN.sam > OUT.bam

I get the following error:

[W::sam_read1] parse error at line 36
[main_samview] truncated file.

Line 36 is this:

=====> Processing read 'simulated.2618103'/1 <=====

There are no errors when BWA is running. I have read the sam file format specifications document and nothing is mentioned about lines starting with =====>

Here is from the beginning of the file to line 40:

@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:gi|9627186|ref|NC_001539.1|  LN:5323
@SQ SN:gi|9629818|ref|NC_001847.1|  LN:135301
@SQ SN:gi|315192962|ref|NC_002306.3|    LN:29355
@SQ SN:gi|21728357|ref|NC_004067.1| LN:6450
@SQ SN:gi|38018060|ref|NC_005148.1| LN:1768
@SQ SN:gi|352950882|ref|NC_011507.2|    LN:3302
@SQ SN:gi|303291528|ref|NC_014406.1|    LN:4926
@SQ SN:gi|311977355|ref|NC_014649.1|    LN:1181549
@SQ SN:gi|448259945|ref|NC_019925.1|    LN:152427
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001539_1.fq.gz seqtk_1/subsample_1/sub_NC_001539_2.fq.gz
=====> Processing read 'simulated.2618103'/1 <=====
* fraction of repetitive seeds: 0.000
* Found CHAIN(0): n=3; weight=81    20;20;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    29;29;0,3095694383(gi|9627186|ref|NC_001539.1|:+401)    52;52;30,3095694413(gi|9627186|ref|NC_001539.1|:+431)
* Found CHAIN(1): n=1; weight=19    19;19;76,1952705833(chr12:+1774857)
* Found CHAIN(2): n=1; weight=19    19;19;40,770774837(chr4:+80285843)

Lines 1-35 seem to fit with the sam file format specifications

Any help would be appreciated.

sam bam BWA • 9.9k views
ADD COMMENT
0
Entering edit mode

How did you run BWA? It seems stderr is making its way into your SAM file.

ADD REPLY
0
Entering edit mode

I agree, it looks as though you're redirecting both stdout and stderr to the same file.

ADD REPLY
0
Entering edit mode

That's what I was thinking but I can't seem to spot how

This is the code I used for BWA....

  bwa mem -v4 combine_reference.fa.gz seqtk_1/subsample_1/sub_NC_001847_1.fq.gz seqtk_1/subsample_1/sub_NC_001847_2.fq.gz  > ./BWA/seqtk_1/subsample_1/sub_NC_001847_BWA.sam

I may have made a really stupid mistake and just not seeing it

ADD REPLY
0
Entering edit mode

I don't see nothing wrong with your command-line, but I don't think using -v4 is needed. Are you running BWA directly, or inside some script or pipeline? Is stderr being redirected somewhere? Is BWA running as a background process?

How did you create the index? What is the size of your reference?

ADD REPLY
0
Entering edit mode

I guess you are running this command via nohup and thought this was not relevant?

ADD REPLY
0
Entering edit mode

You may try to align each read file (sub_NC_001539_1.fq.gz or sub_NC_001539_2.fq.gz, not both) and merge the two bam files using Picard.

ADD REPLY
0
Entering edit mode

BWA is well able to handle paired-end information. Splitting is not necessary and would result in a loss of the insert size information, which then would need to be included in a second step after the merge. If you handle big files, that will take ages.

ADD REPLY
0
Entering edit mode

While running the bwa mem, the value "4 (or in my case 4+)" for the argument "-v" is somehow incorporating stderr into the output sam file. Removing the -v argument or using values 1, 2 or 3 for -v worked for me while running bwa mem. In my case I had single-end fastq files (no paired-end reads). Output will be directly a sorted bam file; 10 threads used.

bwa mem -t 10 ref_genome_index reads.fastq | samtools view -bS | samtools sort -o aln.bam
ADD REPLY
0
Entering edit mode

What is the point of this answer, it is identical to mine?

ADD REPLY
0
Entering edit mode

It simply pinpoints which argument was causing the error. Plus in my case it worked fine even without the -M argument.

ADD REPLY
1
Entering edit mode
7.2 years ago
ATpoint 85k

Do yourself a favor and avoid outputting SAM files. There is no advantage of saving the SAMs, it only wastes disk space. Directly pipe the aligner to SAMtools view to get the binary file. It seems that you have to re-align anyway, so do:

bwa mem -M ref 1.fq 2.fq | samtools view -o out.bam

In case you need sorted files, which is almost always the case, you can also pipe into sort right away:

bwa mem -M ref 1.fq 2.fq | samtools sort -o out_sorted.bam
ADD COMMENT

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6