convert blastn output file to sam/bam
0
0
Entering edit mode
9.4 years ago

I am trying to convert the output of blast in a .sam or .bam file. I tried to use the blast2sam tool but there are many warnings and the output file is not complete.

Is there another tool to make the conversion or another alignment tool for which it is possible to specify the output format as .sam or .bam?

Thanks

blast blast2sam blast2bam • 7.7k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Maybe there is something wrong with your input rather than with the tool you use? You should post the exact command you executed and some or all the warnings you see and give details of the input file, like format and a few example lines.

There is a blast2bam converter, is it the one you used? (You mention blast2sam).

ADD REPLY
0
Entering edit mode

The alignement of the reads has been done with the command

blastn -query 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta \
    -db blast_ref -word_size 15 \
    -outfmt "6 qseqid sseqid pident nident length mismatch positive gapopen gaps ppos qframe sframe sstrand qcovs qstart qend qseq sstart send sseq evalue bitscore score" \
    -out blast_tab

This is the first line of the output blast_tab:

UNC11-SN627:280:C1NEKACXX:4:1101:11031:1976     sequenzadifusione       93.62   44      3       44      0       0       93.62   1       1       plus    98      2       48      TGAACCCGGGAGGTGGAGGTTGCAGTGAGCCGAGATTGCGCCACTGC 24710   24756   TGAACCCGGGAGGTGGAGGCTGCAGTGAGCTGAGATAGCGCCACTGC 6e-16   71.3    38

Then the conversion has been done with the command blast2sam (not blast2bam)

blast2sam.pl blast_tab > blast.sam

For the conversion we didn't use the default format, but the tabular format of the output of blast.

In the conversion there aren't errors, but the output file blast.sam is empty.

Where can be the error?

ADD REPLY
0
Entering edit mode

I think blast2sam.pl has not been updated for some time as Heng Li said:

BLAST support will be dropped unless someone want to maintain it. I realize that it would be better to have fewer functionality to avoid letting others blame me for having too many bugs. I just thought this script may be useful to someone occasionally, but it is now causing more troubles than good. Sorry.

That is part of why I wrote Blast2Bam.

If you want to use it, blast output will have to be in XML format (-outfmt 5).

ADD REPLY
0
Entering edit mode

I downloaded the code, but I'm not able to create the ref.dict. How can I do it?
Then in the folder "src" there are two codes (blastSam.c and blastSam.h), so which one should I use?
Thanks

ADD REPLY
1
Entering edit mode

The .dict file is created by picard-tools:

In your case:

picard-tools CreateSequenceDictionary R=blast_ref O=blast_ref.dict

The src folder contains the source code, not the program.

You need to compile the code first by typing "make" in your command line in the main folder or in the src folder.

The program will then be in the bin folder.

You can then pipe the output of blastn in the program:

blastn -query 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta -db blast_ref -word_size 15 -outfmt 5 | blast2bam - blast_ref.dict 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta > out.sam
ADD REPLY
0
Entering edit mode

By typing make in src folder there is an error

xsltproc --output parseXML.c --stringparam fileType c schema2c.xsl schema.xml
make: xsltproc: Command not found
make: *** [parseXML.c] Error 127
ADD REPLY
1
Entering edit mode

As Pierre told you on seqanswers.com, you got this error because xsltproc wasn't installed on your computer.

In order to compile Blast2Bam, you will also need libxml2, zlib and of course gcc.

If your XML is big, you should pipe the blast output into Blast2Bam, like I've shown you in my previous comment.

Again, if you're afraid the SAM file will be too big, you should pipe the output of Blast2Bam into SAMtools to make a BAM file:

blastn -query 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta -db blast_ref -word_size 15 -outfmt 5 | blast2bam - blast_ref.dict 130205_UNC11-SN627_0280_AC1NEKACXX_TTAGGC_L004_1.fasta | samtools view -Sb -F 0xF00 - > out.bam

-F 0xF00 is used to filter the results in order to keep only the primary alignments. You may or may not want to use this option depending on what you want to do with the results.

ADD REPLY
0
Entering edit mode

I got it working. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6