How to run the BLAST program with the un-mapped reads in bam format from tophat
1
5
Entering edit mode
9.2 years ago
Naresh D J ▴ 110

Hi,

I am pretty new to the analysis of RNA-seq data. My data has 35 % of un-mapped reads (human genome). So I would like to check those un-mapped reads with blast program to check for contamination of other genomes in the data.

I have the un-mapped reads in bam format. How can I run the blast program?

Thank you,

Best Regards,

Naresh DJ

RNA-Seq blast reads • 11k views
ADD COMMENT
8
Entering edit mode
9.2 years ago
samtools view -f 8  input.bam |\
awk '{printf(">%s/%s\n%s\n",$1,(and(int($2),0x40)?1:2),$10);}' | \
blastn -db contamination
ADD COMMENT
0
Entering edit mode

Hi Pierre,

Thank you for the reply. Can you briefly explain the details of the code. What each line does? How to save the output from blast?

ADD REPLY
0
Entering edit mode

first line extract the unmapped read, output is a tab delimited file

second line is awk, it prints the following components : '>' , then the name of the read, then a name extension of the paired reads (/1 or /2), then a Carriage return, then the sequence and a final CR.

3rd line : pipe in blast

How to save the output from blast? blastn -db contamination -out out.blast

ADD REPLY
0
Entering edit mode

Thank you Pierre.

Actually my reads are single end, do I need to make changes to the awk? Is "contamination" the database I need to download as it is not available on the blast installed at my university server?

ADD REPLY
2
Entering edit mode

Single end: you can just use

awk '{printf(">%s\n%s\n",$1,$10);}'

Is "contamination" the database I need to download as it is not available on the blast installed at my university server?

Yes:

ADD REPLY
0
Entering edit mode

Hi Pierre,

I first generated my sam file and then tried your code like this:

awk '{printf(">%s/%s\n%s\n",$1,(and(int($2),0x40)?1:2),$10);}' sample2_dup.sam | blastn -out blast.dup.txt

I get this error message:

BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified

Do you have any idea why?

I get the same error message when I just save the awk output and then run blastn separately.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6