SOAP aligner removes samfile headers
1
0
Entering edit mode
6.9 years ago

Hi,

I am using samtools to convert the sam file outputs from SOAP aligner to bam file.

However, the samfiles do not have any sort of headers. Whenever I try to convert sam -> bam, an error message appears because the file is missing the headers

[W::sam_read1] parse error at line 1
[main_samview] truncated file.

This is the SOAP aligner script:

soap -D combine_reference.fa.index -a Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_1.fq.gz -b Mock_Run/seqtk_1/subsample_1/new_sub_NC_001539_2.fq.gz -o Mock_Output/NC_001539_mapped.sam -2 Mock_Output/NC_001539_unpair.sam -p 3

This is the samtools script:

samtools view -bS NC_001539_mapped.sam > NC_001539_mapped.bam

This is what the head of the sam file looks like:

simulated.A.A.1.1217859/1   
ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98

It is completely missing the headers that are normally found in samfiles.

Is this a problem with the aligner? How can I get the samfile with the correct headers?

Any help would be appreciated.

SOAP Samfile • 2.7k views
ADD COMMENT
0
Entering edit mode
samtools view -bS

If you don't use -h in your samtools command, it hardly will keep the header.

  -h       include header in SAM output
  -H       print SAM header only (no alignments)

Or is this problem beyond this?

ADD REPLY
0
Entering edit mode

Soap is outputting the file in ASCII text format so using -h with samtools wouldn't bring a header because there isn't a header to begin with (Found all this out after posting the question).

The problem is with the soap output file

ADD REPLY
0
Entering edit mode

They suggest to use the soap2sam.pl script that is downloadable at their own homepage: http://soap.genomics.org.cn/down/soap2sam.tar.gz

Did you use it?

ADD REPLY
0
Entering edit mode

I have attempted to use this. For some reason my output file is different from what they say it should be. This is the format they say it should be:

http://soap.genomics.org.cn/soap1/#Formatofoutput

This is the format of my file:

simulated.A.A.1.1217859/1   ACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATATGTCAAGAATTGTAACTTACTCAGA    IIGBIG?I9I:AI;IIIFIHFEI@IIBII@FIFH=II@AG?IIEEC>HIHHIICGIEIGIBHIEGIIAIGIGEGFGEIGHFCIIEHIIIHIIHHIHHHHI    1   a   100 -   gi|9627186|ref|NC_001539.1| 5036    0   100M    100
    simulated.A.A.1.1217859/2   GTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGAT    HIHHIHHIHIHHHHIIIFGGIIHIIGGIIIEGIFEHHAHICHI@C@IIGICHHCIGFIGIIDGIGIIABIFDIIEDCEGIII@IC<BI=7IEII;II9?I    1   b   100 +   gi|9627186|ref|NC_001539.1| 4713    1   G->58T7 100M    58G41
    simulated.A.A.1.5536771/1   AAGACCTGAACATACACAACCAATAAGAGACAGAATGTTGAACATTAAGTTAGTATGTAAGCTTCCAGGAGACTTTGGTTTGGTTGATAAAGAAGAATGG    HHIHIHHHIIIIHIHGIIIIIGIIEIIGHFIFICIIDBDIIIGIGIIFHIIBIHIIBGEIIIFIIGIIDIIBII?IEGICDGIDA<BIGIDCHIIIDDIH    1   a   100 +   gi|9627186|ref|NC_001539.1| 1768    0   100M    100
    simulated.A.A.1.5536771/2   CTGACTCCGGACGTAGTGGACCTTGCACTGGAACCGTGGAGTACTCCAGATACGCCTATTGCAGAAACTGCAAATCAACAATCAAACCAACTTGGCGTTA    2DIICID4IFIIIH;BI1FG=I=>IICIIIIAIIIIHI@IH@IIFEIGIICIIGHEDIEFIIEFIGEIGIIIIIHIFIIIFIHIHIIGHGIHIIIHIIHH    1   b   100 -   gi|9627186|ref|NC_001539.1| 2069    0   100M    100
    simulated.A.A.1.7964812/1   TTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTATATGGTATACAATAACTGTAAGAAATAGAAG    HHHIHHGHHGIIHGFHIEEIIIHIFGHIHIHBIIIIIEDDIIIHIIFIIG7AIFIII;HIHIIIFI@IAGIIIIIIE>ID8IIIFII@IGBI;=>9II6I    1   a   100 +   gi|9627186|ref|NC_001539.1| 4702    1   G->69T7 100M    69G30
    simulated.A.A.1.7964812/2   CATCGTATACTGTCTATAAGGTGAACTAACCTTACCATAAGTATCAATCGTTGCGCCCTAATTTAACAAATGAATATGATCCTGATGCATCTGCTAATAT    G3EI<IIII@0AIIIIIIAII@IIGHI>I@=IH@HBI@II>IIF@IGIGFIIHFIBFDEDIDDGEFIIIHIIIIGIIHHIHHHIIHIIHHIIIIIIIHHI    1   b   100 -   gi|9627186|ref|NC_001539.1| 5012    1   C->1A7  100M    1C98
    simulated.A.A.1.3374391/1   GATAACCAATTTAACTATGTACCAAGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACT    IHIIIHHIHHHHHGHGIIGHHIIIHHFDIEHEFFI@IHEHIIICIBIE<AHIGFIDFGGCIIIID@<IGIIBD:FIIIIIIHIH=IIA7IIICIB6IEII    1   a   100 +   gi|9627186|ref|NC_001539.1| 4451    0   100M    100
    simulated.A.A.1.3374391/2   ACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    HI7:?CIHHIH=II;I<@BDII7@<IFCG?DIIFBIIHFGIEGIFDIIIIICI@FCIEFGIIFDFDEHIFGGHGHIEIIDHIHHGIIHHIIHIGIIHHII    1   b   100 -   gi|9627186|ref|NC_001539.1| 4785    1   T->48G9 100M    48T51
    simulated.A.A.1.4550806/1   CCTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGTAGGTTTGTTAGATGGTATACAATAACTGTAAGAAATAGAAGAACATTTAGATCATAGTTAGT    (I1IIIIHIIIEIIIGGI@<IIICACCBICICDHIFCIIIFIIFEGGCII@IIIIFIIGEFCFBHIGEIFIIIGHIIIHHFIIHIGIHIIIHIHHIHHII    1   a   100 -   gi|9627186|ref|NC_001539.1| 4785    2   A->0C-24    T->48G9 100M    0A47T51
    simulated.A.A.1.4550806/2   AGTAATATTGGAGGTATGAAAATTGTATATGAAAAATCTCAACTAGCACCTAGAAAATTATATTAACATACTTACTATGGTTTTTATGTTTATTACATAT    IHHIIGHHIHIHIHIIHHFHIFGHDIFIIIIBAIFEGBHIFFIDDEGIICDBIHII@IHIDIICIIIIIIII;GIGDFIIDII@IIIIII@7IAII8?EI    1   b   100 +   gi|9627186|ref|NC_001539.1| 4475    0   100M    100

If I use soap2sam.pl, the program produces a file that is ASCII text (not a sam file).

I have trawled through their website and multiple forums looking for the answer and can't find it anywhere.

ADD REPLY
2
Entering edit mode
6.9 years ago

Found this: https://github.com/BGI-shenzhen/soap2bam

If you use -Dict and specify a reference, it will produce the header.

ADD COMMENT
0
Entering edit mode

Thank you but this is using soap2bam, which I don't have.

I don't have admin rights to add this onto my terminal

ADD REPLY
0
Entering edit mode

Download it on a folder that you own, run chmod 755 on that file, and you will have those permissions.

ADD REPLY
1
Entering edit mode

Awesome, thank you for all the help!!

ADD REPLY
0
Entering edit mode

Mark it as answer, and close the thread, then! ;) Matteo

ADD REPLY
0
Entering edit mode

@fiona can't but I have moved your comment to an answer. It can now be accepted (green-check mark).

ADD REPLY

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6