Hello!
Lets take the story from the beginning.
I am running a blastx for the query (contig) above using only the Oryza taxa and evalue 3.
Contig:
>Contig375
CGGGGATCTGAATGGACTTCTCTCATTTCTACCAGCATGCTGGTGGGAATCTTGTATATATAGAGATTTG
ACAATCAAGTAAGAAGTTTAAATAATTTGTAGCTTTCTTTTGTAATGCATACTTTTATCGATACCTAGAA
AAAATTACGTTTAGATCACTTATTAGAGTGACATTGTTGTCATACATTGGATGTTTATAAACCTGATGAT
CTGTTTGCATATTCCTGAACCAATGCCCCAAAGAGTGAGGGCTTCTCAATCAAACGTGAAGGCTTGTCAA
ATTCTTTTGCATACCCTGCATCAATGACTAAAACCCGATCACAGTCCATGACAGTAGGTATCCTATGAGC
TATGCTAACGATGGTA
As you can see (if you run the same job) it returns a numerous hits as a result and the first is the one with the smallest evalue.
So what I want is to get as first result the sequence with the above characteristics:
Its length to be as greater as possible. e.g in our example the first hit has length of 251 while the second one has a length of 1278 amino acids.
To be as possible near to the 5' end. By this i mean to be closer to the first amino acid (methionine) e.g in our example some hits start from the 20th amino acid while others start from the 1200th.
In a nutshell I want to filter the results of blastx to return me, as bigger (in length) as possible protein but in the same time that sequence to be close or identical to the beginning of the protein.
So is there any way to filter the results in such a way ? Or maybe there is another database rather than this of NCBI to search for more completed protein sequences .
Thank you.
Are you sure that this works ? I am running it and it returns me back the results in the same order.
Here is my initial XML file.
ah yes, sorry, I forgot the attribute "data-type="number"' . I updated the code