I am trying to get specific hits from a blast output. The following are the steps that I performed (this was done on web blast):
- Paste aa input sequence to blast and run it with default parameters on nr database
- Choose the new blast result page so that I can set some filters
- Set filter- identity from 50%-90%
- Select sequences with Query coverage of 70%
- Download all filtered hits in fasta/csv format.
So now, I have all blast hits for the input sequence in fasta format. However, I have more than 100 different sequences and would like to perform the same steps on each of them.
I could use blast+ but I am not sure if I can set the filters mentioned above in the commandline for blastp. I could not find these filters in >blastp -help. I also checked if Biopython can do this for me but I could not find useful links. Also could not find useful link in biostars or biology.stackexchange.com. The most related one that I found was here.
I am thinking that, after I get the output, I would need to have some script that will do this for me. I would really appreciate if someone can guide me in that right direction. Please forgive me if the question is not comprehensible. I can explain further if required. Thank you.
Following is a sample of the sequences that I have:
>sp|Q03155|AIDA_ECOLX AIDA-I autotransporter OS=Escherichia coli OX=562 GN=aidA-I PE=1 SV=1
MNKAYSIIWSHSRQAWIVASELARGHGFVLAKNTLLVLAVVSTIGNAFAVNISGTVSSGG
TVSSGETQIVYSGRGNSNATVNSGGTQIVNNGGKTTATTVNSSGSQNVGTSGATISTIVN
SGGIQRVSSGGVASATNLSGGAQNIYNLGHASNTVIFSGGNQTIFSGGITDSTNISSGGQ
QRVSSGGVASNTTINSSGAQNILSEEGAISTHISSGGNQYISAGANATETIVNSGGFQRV
NSGAVATGTVLSGGTQNVSSGGSAISTSVYNSGVQTVFAGATVTDTTVNSGGNQNISSGG
IVSETTVNVSGTQNIYSGGSALSANIKGSQIVNSEGTAINTLVSDGGYQHIRNGGIASGT
IVNQSGYVNISSGGYAESTIINSGGTLRVLSDGYARGTILNNSGRENVSNGGVSYNAMIN
TGGNQYIYSDGEATAAIVNTSGFQRINSGGTAPVQNSVVVTRTVSSAAKPFDAEVYSGGK
QTVYLWRGIWYSNFLTAVWSMFPGTASGANVNLSGRLNAFAGNVVGTILNQEGRQYVYSG
ATATSTVGNNEGREYVLSGGITDGTVLNSGGLQAVSSGGKASATVINEGGAQFVYDGGQV
TGTNIKNGGTIRVDSGASALNIALSSGGNLFTSTGATLPELTTMAALSVSQNHASNIVLE
NGGLLRVTSGGTATDTTVNSAGRLRIDDGGTINGTTTINADGIVAGTNIQNDGNFILNLA
ENYDFETELSGSGVLVKDNTGIMTYAGTLTQAQGVNVKNGGIIFDSAVVNADMAVNQNAY
INISDQATINGSVNNNGSIVINNSIINGNITNDADLSFGTAKLLSATVNGSLVNNKNIIL
NPTKESAGNTLTVSNYTGTPGSVISLGGVLEGDNSLTDRLVVKGNTSGQSDIVYVNEDGS
GGQTRDGINIISVEGNSDAEFSLKNRVVAGAYDYTLQKGNESGTDNKGWYLTSHLPTSDT
RQYRPENGSYATNMALANSLFLMDLNERKQFRAMSDNTQPESASVWMKITGGISSGKLND
GQNKTTTNQFINQLGGDIYKFHAEQLGDFTLGIMGGYANAKGKTINYTSNKAARNTLDGY
SVGVYGTWYQNGENATGLFAETWMQYNWFNASVKGDGLEEEKYNLNGLTASAGGGYNLNV
HTWTSPEGITGEFWLQPHLQAVWMGVTPDTHQEDNGTVVQGAGKNNIQTKAGIRASWKVK
STLDKDTGRRFRPYIEANWIHNTHEFGVKMSDDSQLLSGSRNQGEIKTGIEGVITQNLSV
NGGVAYQAGGHGSNAISGALGIKYSF
>sp|P86223|VDAC2_MESAU Voltage-dependent anion-selective channel protein 2 (Fragments) OS=Mesocricetus auratus OX=10036 GN=VDAC2 PE=1 SV=1
DIFNKGFGFGLVKYKWCEYGLTFTEKLTFDTTFSPNTGKKSNFAVGYRTGDFQLHTNVNN
GTEFGGSIYQKVCEDFDTSVNLAWTSGTNCTRVNNSSLIGVGYTQTLRPGVKLTLSALVD
GK
>sp|P64744|SMASE_MYCBO Sphingomyelinase OS=Mycobacterium bovis (strain ATCC BAA-935 / AF2122/97) OX=233413 GN=BQ2027_MB0912 PE=3 SV=1
MDYAKRIGQVGALAVVLGVGAAVTTHAIGSAAPTDPSSSSTDSPVDACSPLGGSASSLAA
IPGASVPQVGVRQVDPGSIPDDLLNALIDFLAAVRNGLVPIIENRTPVANPQQVSVPEGG
TVGPVRFDACDPDGNRMTFAVRERGAPGGPQHGIVTVDQRTASFIYTADPGFVGTDTFSV
NVSDDTSLHVHGLAGYLGPFHGHDDVATVTVFVGNTPTDTISGDFSMLTYNIAGLPFPLS
SAILPRFFYTKEIGKRLNAYYVANVQEDFAYHQFLIKKSKMPSQTPPEPPTLLWPIGVPF
SDGLNTLSEFKVQRLDRQTWYECTSDNCLTLKGFTYSQMRLPGGDTVDVYNLHTNTGGGP
TTNANLAQVANYIQQNSAGRAVIVTGDFNARYSDDQSALLQFAQVNGLTDAWVQVEHGPT
TPPFAPTCMVGNECELLDKIFYRSGQGVTLQAVSYGNEAPKFFNSKGEPLSDHSPAVVGF
HYVADNVAVR
This is a portion of the output that I have for the first sequence
The URL of "here" is linking to your blast output example picture.
these links: