Hi, I'd like to use my nucleotide sequences to do the blastx to get protein sequences. Here is my command:
blastx -db refseq_protein -query outputA.fasta -out blastx.fasta -evalue 0.001
And I got more than 40000 results, which is too much, so I want to do some cut-off by defining length, for example ,I only want those protein sequences from 300bp to 500bp, is it possible to do that? How I should modify this command?
Many thanks
Thanks for reply Michael, Yes you're correct, I thought about that before but since here I'd like to find as many species as possible, So I want to start from low value first and see how's going and then search more stringently. Here I set this cut-off from 300 to 500 is because I found this protein in literature is around 400, so that's why, but yes of course we can set from 350-450 something like this, but I'm not sure if I would filter those that are interesting. Here the 300 -500 means the "Query cover" in blast, I don't if this is more clear to you.
It is still not clear to me what you are trying to accomplish here. Is this metagenomics?
Not yet , now I'm studying evolution of a new gene in bacteria. I'm trying to find out the distribution of this gene in bacteria, the first step is to get the refseq nucleotide sequences of this gene, but only 58 genomes were found , I've done this, so now I'm trying to do blast in protein database to see if I can find this gene in other species.