Identify protein sequences from L.infantum with more than 60% conservancy with other Leishmania species
1
0
Entering edit mode
7.0 years ago
dzisis1986 ▴ 70

I downloaded the available proteomes (from L. braziliensis, Leishmania major, and L. infantum) from TriTrypDB and i want to use them in order to take only protein sequences from L.infantum with more than 60% conservancy with other Leishmania species verified through BLAST protein alignment. I am using Blastp in command line and up to now i manage to take only the two top sequences with this :

blastp -query Linfantum.prot.fasta -db /home/dimitris/blastdb/db/blast/Lbraziliensis.prot.fasta -evalue 1e-3 -outfmt '6 qseqid sseqid qlen length evalue' -max_target_seqs 2 -comp_based_stats F -out Linfantum_vs_Lbraziliensis_seq_newest.txt

Do you know which parameter i have to use in order to achieve 60% of conservancy between Linfantum and Lbraziliensis by using Blastp ?

blast blastp proteins leismania commandline • 1.5k views
ADD COMMENT
1
Entering edit mode

You will need to post process the results to get this type of information. There isn't a simple parameter that will do that.

ADD REPLY
0
Entering edit mode

this is a sample result of my blastp :

LinJ_31_3340_mRNA   LbrM_31_2880_mRNA   824 793 0.0
LinJ_31_3340_mRNA   LbrM_31_2870_mRNA   824 438 0.0
LinJ_31_3350_mRNA   LbrM_31_2870_mRNA   468 468 0.0
LinJ_31_3350_mRNA   LbrM_31_2880_mRNA   468 469 0.0
LinJ_34_4350_mRNA   LbrM_08_1110_mRNA   193 169 3e-57
LinJ_34_4350_mRNA   LbrM_08_1140_mRNA   193 168 4e-57
LinJ_34_4360_mRNA   LbrM_14_1300_mRNA   336 337 0.0
LinJ_34_4360_mRNA   LbrM_20_1430_mRNA   336 167 2e-82
LinJ_34_4370_mRNA   LbrM_08_1100_mRNA   185 182 4e-64
LinJ_34_4370_mRNA   LbrM_08_1090_mRNA   185 174 2e-63
LinJ_34_4380_mRNA   LbrM_20_3960_mRNA   113 113 7e-83
LinJ_34_4380_mRNA   LbrM_20_3940_mRNA   113 113 7e-83
LinJ_34_4390_mRNA   LbrM_20_3950_mRNA   221 221 1e-144
LinJ_34_4390_mRNA   LbrM_20_3980_mRNA   221 221 4e-144
LinJ_35_5440_mRNA   LbrM_34_0010_mRNA   167 114 4e-66
LinJ_35_5440_mRNA   LbrM_34_0040_mRNA   167 114 4e-65
LinJ_35_5450_mRNA   LbrM_34_0010_mRNA   508 455 0.0
LinJ_35_5450_mRNA   LbrM_34_0040_mRNA   508 424 0.0
LinJ_35_5460_mRNA   LbrM_34_1480_mRNA   296 293 0.0
LinJ_35_5460_mRNA   LbrM_16_0970_mRNA   296 205 3e-25
LinJ_03_0970_mRNA   LbrM_35_6200_mRNA   620 220 2e-23
LinJ_03_0970_mRNA   LbrM_18_0900_mRNA   620 149 2e-05
LinJ_08_1320_mRNA   LbrM_10_1520_mRNA   202 199 3e-62
LinJ_08_1320_mRNA   LbrM_08_0990_mRNA   202 181 3e-61
LinJ_08_1330_mRNA   LbrM_08_0990_mRNA   201 176 5e-60
LinJ_08_1330_mRNA   LbrM_08_1140_mRNA   201 199 2e-58
LinJ_31_3360_mRNA   LbrM_03_0750_mRNA   799 819 0.0
LinJ_31_3360_mRNA   LbrM_34_0520_mRNA   799 226 6e-04
LinJ_31_3370_mRNA   LbrM_03_0740_mRNA   396 342 0.0
LinJ_31_3370_mRNA   LbrM_34_4320_mRNA   396 170 7e-04
LinJ_14_1600_mRNA   LbrM_14_1120_mRNA   859 939 0.0
LinJ_14_1600_mRNA   LbrM_14_1120_mRNA   859 939 0.0
LinJ_14_1600_mRNA   LbrM_14_1120_mRNA   859 946 0.0
ADD REPLY
1
Entering edit mode
7.0 years ago
ori ▴ 50

First, it is necessary to clarify the definition of “60% of conservancy”
(e.g. 60% similarity across 90% of the protein sequence).

Next, try BLASTP with additional -outfmt options, “pident” and “qcovs”.

You will get results with pident (% of identical matches) and qcovs (query coverage per subject) column.
If you set -max_target_seqs 2, you will be able to open the output file in EXCEL and filter your results by their column (e.g. pident>=0.6, qcovs>=0.9).

ADD COMMENT

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6