Error in SAM to BAM conversion
2
0
Entering edit mode
5.5 years ago
vivekr ▴ 10

I have used Bowtie2 to get alignment output as .SAM files. After getting sam files, I am getting the following errors:

[W::sam_read1] Parse error at line 459
[main_samview] truncated file.

I have used the following command to produce sam file:

./bowtie2 -p 5 -x /ref/grch38 --no-unal -U fastq/SRR7541164.fastq >  /fastq/test.sam

And for samtools :

samtools view -S -b /fastq/test.sam > /fastq/test.bam

I have checked .SAM file and fastq file to get the reason of error : few lines of both files are as follows: Command:

head -n 10 /home/vivekr/Mirpipe/moRNA_data/trimmed_fastq/SRR7541168_trimmed.fq

Output:

@SRR7541168.1 1 length=51
NCAGTGCACTACAGAACTTTGT
+
#0<FFFFFFFFFFIIIIIIIFI
@SRR7541168.2 2 length=51
NCCCTGTGGTCTAGTGGTTAGGATT
+
#0<BFFFFBFFFFIFFBFFFF<BFF
@SRR7541168.3 3 length=51
NATTGCACTTGTCCCGGCCTGT

And for SAM files :

Command:

head -n 460 /home/vivekr/Mirpipe/moRNA_data/sorted_bam/SRR7541164_trimmed.sam | tail -n 20

Response:

@SQ SN:chrUn_KI270749v1 LN:158759
@SQ SN:chrUn_KI270750v1 LN:148850
@SQ SN:chrUn_KI270751v1 LN:150742
@SQ SN:chrUn_KI270752v1 LN:27745
@SQ SN:chrUn_KI270753v1 LN:62944
@SQ SN:chrUn_KI270754v1 LN:40191
@SQ SN:chrUn_KI270755v1 LN:36723
@SQ SN:chrUn_KI270756v1 LN:79590
@SQ SN:chrUn_KI270757v1 LN:71251
@SQ SN:chrUn_GL000214v1 LN:137718
@SQ SN:chrUn_KI270742v1 LN:186739
@SQ SN:chrUn_GL000216v2 LN:176608
@SQ SN:chrUn_GL000218v1 LN:161147
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@SQ SN:chrY_KI270740v1_random   LN:37240
@PG ID:bowtie2  PN:bowtie2  VN:2.3.5.1  CL:"/home/vivekr/Mirpipe/Packages1
/bowtie2-2.3.5.1-sra-linux-x86_64/bowtie2-align-s --wrapper basic-0 -p 8 -x /home/vivekr/Mirpipe
/ref/grch38/hg38 /home/vivekr/Mirpipe/moRNA_data/sorted_bam/SRR7541164_trimmed.sam --passthrough -U /home/vivekr/Mirpipe/moRNA_data/trimmed_fastq/SRR7541164_trimmed.fq"
SRR7541164.1    16  chr7    25949921    42  23M *   0   0   GACAAAGTTCTGTAGTGCACTGN IIIIIIIIIIFFFFFFFFFF<0# AS:i:-1 XN:i:0  XM:i:1  
XO:i:0  XG:i:0  NM:i:1  MD:Z:22A0   YT:Z:UU
@SRR7541164.1 1 length=51%0ANCAGTGCACTACAGAACTTTGTC%0A+%0A#0<FFFFFFFFFFIIIIIIIIII%0A
SRR7541164.2    16  chr7    25949923    42  21M *   0   0   CAAAGTTCTGTAGTGCACTGN   IIIIIIIFFFFFFFFFFF<0#   AS:i:-1 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:20A0   YT:Z:UU

Version details are as follows:

samtools 1.7

Using htslib 1.7-2

Copyright (C) 2018 Genome Research Ltd.

I am not able to check what is causing sam file curruption or error while conversion from sam to bam. Anu sugesstions please....

Thanks.

alignment • 2.0k views
ADD COMMENT
1
Entering edit mode

Hello ,

./bowtie2 -p 5 -x /ref/grch38 --no-unal -U fastq/SRR7541164.fastq > -S /fastq/test.sam

Just a typo here or do you realy redirecte bowtie output to -S?

What's the output of tail /fastq/test.sam?

fin swimmer

ADD REPLY
0
Entering edit mode

No. I use this -S to get output as sam file. The output of tail /fastq/test.sam is as follows:

@SRR7541164.20452273 20452273 length=51%0ACAAAGAATTCTCCTTTTGGGCTGGAATTCTCGGGTGCCAAGGAACTCCAGT%0A+SRR7541164.20452273 20452273 length=51%0ABBBFFFFFFFFFFFFIIIIIIIIIBFFIIIIIIIIIIIIIIIIIIIIIFII%0A SRR7541164.20452274 4 * 0 0 * * 0 0 TCAGTGCACTACAGAACTTTGTCTGGAATTCTCGGGTGCCAAGGAACTCCA BBBFFFFFFFFFFIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIFII YT:Z:UU

@SRR7541164.20452274 20452274 length=51%0ATCAGTGCACTACAGAACTTTGTCTGGAATTCTCGGGTGCCAAGGAACTCCA%0A+SRR7541164.20452274 20452274 length=51%0ABBBFFFFFFFFFFIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIFII%0A SRR7541164.20452275 4 * 0 0 * * 0 0 GTTTCCGTAGTGTAGTGGTTATCACGTTCGCCTTGGAATTCTCGGGTGCCA BBBFFFFFFFFFFFIFIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFBFF YT:Z:UU

@SRR7541164.20452275 20452275 length=51%0AGTTTCCGTAGTGTAGTGGTTATCACGTTCGCCTTGGAATTCTCGGGTGCCA%0A+SRR7541164.20452275 20452275 length=51%0ABBBFFFFFFFFFFFIFIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFBFF%0A SRR7541164.20452276 4 * 0 0 * * 0 0 CTGTCTGAGCGTCGCTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACAT BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFFFIIFFFIIIIIIIIIIBFFI YT:Z:UU

@SRR7541164.20452276 20452276 length=51%0ACTGTCTGAGCGTCGCTTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACAT%0A+SRR7541164.20452276 20452276 length=51%0ABBBFFFFFFFFFFIIIIIIIIIIIIIIIIFFFIIFFFIIIIIIIIIIBFFI%0A SRR7541164.20452277 4 * 0 0 * * 0 0 AGCTACATTGTCTGCTGGGTTTTGGAATTCTCGGGTGCCAAGGAACTCCAG BBBFFFFFFFFFFIIIIIIFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIII YT:Z:UU

@SRR7541164.20452277 20452277 length=51%0AAGCTACATTGTCTGCTGGGTTTTGGAATTCTCGGGTGCCAAGGAACTCCAG%0A+SRR7541164.20452277 20452277 length=51%0ABBBFFFFFFFFFFIIIIIIFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIII%0A

ADD REPLY
2
Entering edit mode

That last line is truncated, suggesting that your alignment got aborted prematurely.

Note that do sam-to-bam conversion on the fly without using a sam intermediate:

bowtie2 -x genome -U reads.fastq.gz | samtools sort -o alignment.bam
ADD REPLY
0
Entering edit mode

You cannot use -S and > at the same time, so either you made a mistake or you gave us the wrong command.

ADD REPLY
1
Entering edit mode

I made a mistake. Definitely -S and > can't used together. Thanks for pointing it out. I have also checked and confirmed this with botiew2 manuals. I have edited the question also.

ADD REPLY
0
Entering edit mode

Or sometimes check md5sum, if fastq files are downloaded correctly..

ADD REPLY
1
Entering edit mode
5.5 years ago
ATpoint 85k

As this is a common problem, here a quick summary:

  • Check that the output path is correct: /fastq/test.sam indicates writing to a root directory
  • do not store SAM files, use samtools view directly to get BAM: bowtie2 (...) | samtools view -o out.bam
  • output seems corrupted, rerun the entire alignment with the given suggestions.
ADD COMMENT
0
Entering edit mode
5.5 years ago
vivekr ▴ 10

It seems my issue is solved. Let me explain how I was getting fastq file and why I was getting error:

I need to do custom filtration in original fastq file. For that I have used HTSeq package available in bioconda. After that I need to generate bam file for each sample. So there can be two possible reason for getting surrupted sam file:

  1. Re-generated fastq file may not be written as proper/ desired manner which leads to currupted sam file and cause error in sam to bam conversion.

  2. Thanks to @WouterDeCoster to point out error in my command. I don't know why many times I used -S and > in same command and never cause erro but this time I got error form which I got to know about this.

How I solved this:

Instead to using bioconda, I used bbduk.sh and get correct fastq file and also with right command, I am able to get BAM file for all samples. Thanks to all for support.

Although, I need one more custom filtration (remove reads having at most 2 bases under quality score 20 and remove reads with unique sequence having read count less than 10 and I did not find any tool for those task till now. Please give any suggestions if I missed anything....) and I am looking for proper way to do this. For time being, I am moving forward with fastq obtained after bbduk.sh.

ADD COMMENT

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6