Extract sequence of BLAST hits
2
0
Entering edit mode
8.2 years ago
ThePresident ▴ 180

I know this question has been asked and also answered but it didn't work for me, so...

Basically, I blast a nucleotide sequence against nr database on NCBI website (I also specify the organism). Now, I would like to extract hit (subject) sequences in fasta so I can align them later.

I tried parsing the XML file with Pierre's blast2fasta stylesheet but got numerous errors.

Could someone point me to an alternative way to parse XML alignment file or another way of extracting subject hits following blast.

Thanks, TP

BLAST XML • 3.4k views
ADD COMMENT
0
Entering edit mode

but got numerous errors.

which errors ?

ADD REPLY
0
Entering edit mode

For example:

49: parser error : Specification mandate value for attribute data-pjax-transient
a name="request-id" content="82400435:2472:679E968:57D70D27" data-pjax-transient

93: parser error : Opening and ending tag mismatch: link line 92 and head
  </head>

168: parser error : Specification mandate value for attribute itemscope
        <div itemscope itemtype="http://schema.org/SoftwareSourceCode">

254: parser error : attributes construct error
  <span itemscope itemtype="http://schema.org/ListItem" itemprop="itemListElemen

709: parser error : Entity 'copy' not defined
      <li>&copy; 2016 <span title="0.07286s from github-fe136-cp1-prd.iad.github

759: parser error : Premature end of data in tag meta line 81

And so on. I downloaded XML file after running blastn and then used

xsltproc --novalid blast2fasta.xsl alignment.xml
ADD REPLY
1
Entering edit mode
8.2 years ago

you have downloaded a web page, not the 'raw' stylesheet itself : https://raw.githubusercontent.com/lindenb/xslt-sandbox/master/stylesheets/bio/ncbi/blast2fasta.xsl

...

ADD COMMENT
0
Entering edit mode

It works now... Thanks.

ADD REPLY
0
Entering edit mode
8.2 years ago
BioinfGuru ★ 2.1k

How are your Perl skills? I have done this by using the "Bio" package of Bioperl

https://metacpan.org/pod/distribution/BioPerl/BioPerl.pm

To set blast parameters --> use Bio::Seq;

To run a remote blast search --> use Bio::Tools::Run::RemoteBlast;

To parse the blast report --> use Bio::SearchIO;

To retrieve the hit sequences --> use Bio::DB::GenBank;

Of course, if you know how to download the BLAST report in the correct format direct from the website...then just use the last 2 (Bio::SearchIO; and Bio::DB::GenBank;)

ADD COMMENT
0
Entering edit mode

My Perl skills are non-existant, but I thanks anyway, I can still try and run it.

ADD REPLY

Login before adding your answer.

Traffic: 2957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6