Entering edit mode
10.8 years ago
HG
★
1.2k
Hi everyone, I performed a local blastp with a query protein vs 5 genome (ORF data ). Now i want to extract all the top 5 sequence automatically and make a final top hit file through awk/perl/shell. Can any one please help me out how to do it.
# BLASTP 2.2.25 [Feb-01-2011]
# Query: gnl|ECOLI|EG10158-MONOMER ClpP 455901..456524 Escherichia coli K-12 substr. MG1655
# Database: mybd
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
gnl|ECOLI|EG10158-MONOMER V161_10_length_174349_cov_20.0403_ID_19_63 100.00 207 0 0 1 207 1 207 9e-157 431
gnl|ECOLI|EG10158-MONOMER RS374_6_length_260428_cov_19.5118_ID_11_179 100.00 207 0 0 1 207 1 207 9e-157 431
gnl|ECOLI|EG10158-MONOMER H127_1_length_711606_cov_31.5057_ID_1_431 100.00 207 0 0 1 207 1 207 9e-157 431
gnl|ECOLI|EG10158-MONOMER sff_11_length_151555_cov_28.2432_ID_21_27 100.00 207 0 0 1 207 1 207 9e-157 431
gnl|ECOLI|EG10158-MONOMER H75_1_length_711552_cov_31.2637_ID_1_431 100.00 207 0 0 1 207 1 207 9e-157 431
gnl|ECOLI|EG10158-MONOMER H75_26_length_34692_cov_38.2655_ID_51_32 28.24 131 90 2 73 202 63 190 3e-11 60.8
gnl|ECOLI|EG10158-MONOMER RS374_42_length_11164_cov_52.9994_ID_83_6 28.24 131 90 2 73 202 63 190 4e-11 60.8
gnl|ECOLI|EG10158-MONOMER H127_2_length_512186_cov_33.056_ID_3_271 26.05 119 67 5 22 121 302 418 5.6 26.9
gnl|ECOLI|EG10158-MONOMER H75_3_length_319828_cov_30.2145_ID_5_224 26.05 119 67 5 22 121 302 418 5.6 26.9
gnl|ECOLI|EG10158-MONOMER V161_3_length_303745_cov_18.9698_ID_5_105 26.05 119 67 5 22 121 302 418 6.3 26.9
gnl|ECOLI|EG10158-MONOMER RS374_2_length_370043_cov_19.1337_ID_3_206 26.27 118 66 5 23 121 303 418 6.4 26.9
gnl|ECOLI|EG10158-MONOMER sff_1_length_375148_cov_28.8962_ID_1_190 26.27 118 66 5 23 121 303 418 6.4 26.9
gnl|ECOLI|EG10158-MONOMER RS374_4_length_285072_cov_19.5222_ID_7_219 30.30 33 23 0 128 160 176 208 7.2 26.6
Have you tried solving this yourself?
How can I sort blast output based on e-value?
I would use linux
sort
andhead
.I appreciate sort and head are useful command. but for my purpose i dont know how much it will be useful. In context of my question if you got properly i need top 5 max score sequence in a separate file so i can pass those sequences in next phase of my pipe.
for example i need : star position to end position of each top 5 sequences.
Getting top 10 sequences of BLAST results Bio Python have a look here. Also, you may use python script taking only top 5 scores