about Blastx generated file
1
0
Entering edit mode
10.0 years ago
Kurban ▴ 230

Hello,

I used blastx blasted my query file with a protein db file.

My query sequences more than 140000, so I just want to see aligned query sequences. but the result gives all the query , and their blast result as: aligned... or no hit was found , respectively. that makes selection of aligned query sequences from the blastx result file a tremendous work. so if I can only extract the aligned query sequences and their alignment information (e- value, score and aligned sequence) would simplify my job a lot.

This is the blastx out file:

Query= comp936_c0_seq10 len=156 path=[335:0-24 360:25-155]

Length=156

***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 21583458

Query= comp1863_c0_seq1 len=2184 path=[0:0-1278 1279:1279-1279
1280:1280-1303 1304:1304-2183]

Length=2184
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  FBpp0075807 FBgn0000404 symbol:CycA family:Transcription Cofact...   287    8e-89

> FBpp0075807 FBgn0000404 symbol:CycA family:Transcription Cofactors
species:Drosophila melanogaster
Length=491

 Score =  287 bits (735),  Expect = 8e-89, Method: Compositional matrix adjust.
Identities = 184/468 (39%), Positives = 261/468 (56%), Gaps = 50/468 (11%)
Frame = -3

Query  1810  MATINIHPDQENRV-PELRqkqannamaaqqKRTGLGLIDHN----KANKAVPKGKQ--P  1652
             MA+  IH D  N+  P ++               G G  + N    +AN AV  G    P
Sbjct  1     MASFQIHQDMSNKENPGIKIPAGVKNTKQPLAVIG-GKAEKNALAPRANFAVLNGNNNVP  59

Query  1651  LKESNLSNAR-VENIHVKEN------RKNVVVPVAQFEAFTVYED--DEQRARIDQKL-R  1502
                  +   R V N++V EN      + NVV  V QF+ F+VYED  D Q A   + L 
Sbjct  60    RPAGKVQVFRDVRNLNVDENVEYGAKKSNVVPVVEQFKTFSVYEDNNDTQVAPSGKSLAS  119

Query  1501  LISKSN--VYKGTAEDRFITKTELAEIERkkqlqklKELAEIPAVIEPKCENDPCTPMSI  1328
             L+ K N  V  G  +                     KEL +      P    D  +PMS+
Sbjct  120   LVDKENHDVKFGAGQ---------------------KELVDYDLDSTPMSVTDVQSPMSV  158

Query  1327  EK-LNDENAENDSSQLAEEVIRKNSNVKDL--------FFEMEEYRDDIYAYLREHELRH  1175
             ++ +      +D S   E  +     VK+L        F E+ +Y+ DI  Y RE E +H
Sbjct  159   DRSILGVIQSSDISVGTETGVSPTGRVKELPPRNDRQRFLEVVQYQMDILEYFRESEKKH  218

Query  1174  RPKPGYIVKQPDVTENMRAVLVDWLVEVTEEYKMQTETLYLAVNFIDRFLSYMSVVRAKL  995
             RPKP Y+ +Q D++ NMR++L+DWLVEV+EEYK+ TETLYL+V ++DRFLS M+VVR+KL
Sbjct  219   RPKPLYMRRQKDISHNMRSILIDWLVEVSEEYKLDTETLYLSVFYLDRFLSQMAVVRSKL  278

Query  994   QLVGTAAMFIASKYEEIFPPDVSEFVYITDDTYDKHQVIRMEHLILRVLGFDLSVPTPLT  815
             QLVGTAAM+IA+KYEEI+PP+V EFV++TDD+Y K QV+RME +IL++L FDL  PT  
Sbjct  279   QLVGTAAMYIAAKYEEIYPPEVGEFVFLTDDSYTKAQVLRMEQVILKILSFDLCTPTAYV  338

Query  814   FINATCISAGLTEKTMYLAMYLSEIALLEVEPYLQFLPSVIASSAIALARHTLGEEAWND  635
             FIN   +   + EK  Y+ +Y+SE++L+E E YLQ+LPS+++S+++ALARH LG E W 
Sbjct  339   FINTYAVLCDMPEKLKYMTLYISELSLMEGETYLQYLPSLMSSASVALARHILGMEMWTP  398

Query  634   SLYKHTGYTLKQLQLCICFLYDMFVKAPNHPQHAIQDKYRSRKYMQVS  491
              L + T Y L+ L+  +  L      A      A+++KY    Y +V+
Sbjct  399   RLEEITTYKLEDLKTVVLHLCHTHKTAKELNTQAMREKYNRDTYKKVA  446

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 472565421

Query= comp1199_c0_seq1 len=1877 path=[19533:0-169 21522:170-173
19704:174-982 21495:983-986 20513:987-1876]

Length=1877

***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 397904649
blast • 3.1k views
ADD COMMENT
0
Entering edit mode
10.0 years ago
Ram 44k

You can use a simple BioPython script to convert your plain text result to tabular format, which will be easier to filter and process. Take a look here: http://biopython.org/DIST/docs/api/Bio.SearchIO.BlastIO-module.html

Or, you can edit my script (this one just filters out no-hits) to read plain text and write tabular. This is BioPerl though. https://github.com/RamRS/myPerlScripts/blob/master/filterBlastReport.pl

ADD COMMENT
0
Entering edit mode

Hello Mr.RamRS,

I tried your script , before any change it showed this result:

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl tf.blast
Getopt::ArgParse: Option in is required
kurban@kurban-X550VC:~/Desktop/tf$

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl --in tf.blast
filterBlastReport.pl: remove entries with no hits from BLAST output file
usage: filterBlastReport.pl [--help|-h] --in|-i

This script reads a BLAST results file as input\ and filters out query
sequences with no hits to the database. \ The results are written in plain text
format to output file.

optional arguments:
    --help, -h     ? show this help message and exit
    --in, -i IN      input BLAST results file

Then my friend changed the script little bit (I believe he change the line 21 if(scalar(@ARGV) != 2)), then it gives this:

kurban@kurban-X550VC:~/Desktop/tf$ perl changed.pl --in tf.blast
2Getopt::ArgParse::Namespace=HASH(0x2adfe78)
unknown option: fasta at changed.pl line 30.

I could not be able to find where is the problem.

ADD REPLY
0
Entering edit mode

Check the usage line, it needs the -i flag before the input file name :)

Run the script (the version before your friend changed it) like so:

perl filterBlastReport.pl -i tf.blast

You'd have to change the argparse code if you wanna use input files without the flag.

EDIT: The change your friend made just bypasses the line of code trying to warn you of an imminent failure - it does nothing to address the cause whatsoever :)

ADD REPLY
0
Entering edit mode

I tried that commend line several times too before make any change of the script, and got the same result:

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl -i tf.blast
filterBlastReport.pl: remove entries with no hits from BLAST output file
usage: filterBlastReport.pl [--help|-h] --in|-i

This script reads a BLAST results file as input\ and filters out query
sequences with no hits to the database. \ The results are written in plain text
format to output file.

optional arguments:
    --help, -h     ? show this help message and exit
    --in, -i IN      input BLAST results file
kurban@kurban-X550VC:~/Desktop/tf$
ADD REPLY
1
Entering edit mode

I just fixed it - it should work fine now. Sorry for the inconvenience

ADD REPLY
0
Entering edit mode

yes Sir, it runs perfect now.

no no, there has not been any inconvenience actually. your suggestion and scripts have been great help, thank you for your time and patience.

ADD REPLY
0
Entering edit mode

You're very welcome! Glad I could be of help, and thank you for finding the bug in my code.

ADD REPLY
0
Entering edit mode

That's weird. I guess the script is a bit buggy. I'll work on it and let you know once it is tweaked. It should not take me more than a couple of hours.

ADD REPLY

Login before adding your answer.

Traffic: 2621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6