Hi,
I am using samtools to convert the sam file outputs from SOAP aligner to bam file.
However, the samfiles do not have any sort of headers. Whenever I try to convert sam -> bam, an error message appears because the file is missing the headers
[W::sam_read1] parse error at line 1
[main_samview] truncated file.
This is the SOAP aligner script:
soap -D combine_reference.fa.index -a Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -b Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_2.fq.gz -o Mock_Output/NC_001539_mapped.sam -2 Mock_Output/NC_001539_unpair.sam -p 3
This is the samtools script:
samtools view -bS NC_001539_mapped.sam > NC_001539_mapped.bam
This is what the head of the sam file looks like:
simulated.A.A.1.1217859/1
ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI 1 a 100 - gi|9627186|ref|NC_001539.1| 5036 0 100M 100
simulated.A.A.1.1217859/2 GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I 1 b 100 + gi|9627186|ref|NC_001539.1| 4713 1 G->58T7 100M 58G41
simulated.A.A.1.5536771/1 AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH 1 a 100 + gi|9627186|ref|NC_001539.1| 1768 0 100M 100
simulated.A.A.1.5536771/2 CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA 2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH 1 b 100 - gi|9627186|ref|NC_001539.1| 2069 0 100M 100
simulated.A.A.1.7964812/1 TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I 1 a 100 + gi|9627186|ref|NC_001539.1| 4702 1 G->69T7 100M 69G30
simulated.A.A.1.7964812/2 CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI 1 b 100 - gi|9627186|ref|NC_001539.1| 5012 1 C->1A7 100M 1C98
It is completely missing the headers that are normally found in samfiles.
Is this a problem with the aligner? How can I get the samfile with the correct headers?
Any help would be appreciated.
If you don't use
-h
in yoursamtools
command, it hardly will keep the header.Or is this problem beyond this?
Soap is outputting the file in ASCII text format so using -h with samtools wouldn't bring a header because there isn't a header to begin with (Found all this out after posting the question).
The problem is with the soap output file
They suggest to use the
soap2sam.pl
script that is downloadable at their own homepage: http://soap.genomics.org.cn/down/soap2sam.tar.gzDid you use it?
I have attempted to use this. For some reason my output file is different from what they say it should be. This is the format they say it should be:
http://soap.genomics.org.cn/soap1/#Formatofoutput
This is the format of my file:
If I use soap2sam.pl, the program produces a file that is ASCII text (not a sam file).
I have trawled through their website and multiple forums looking for the answer and can't find it anywhere.