Hello everyone,
I have two fasta files, one of them with a hundred thousand of nucleotide sequences in it, and other one with the same number of amino acid sequences in it.
I made the fasta file with amino acid sequences as a database and blastxed other file.
Here what I want to do is to extract the blastxed nucleotide sequences' corresponding translated amino acid sequences form blastx result file.
So could u suggest me some ways to do that?!
Well you could use
grep -A 100 URSEQID input.fa
function where-A
refers to the number of lines after the matched sequence ID or perl call:or an awk command of you can import everything into a database and use sql queries. there are many ways to do this. The question is which OS are you running on, have your sequence ID's remained the same and could you provide a quick example of your files
Hi @mxs,
My OS is ubuntu. The code I used is:
And the resultant file is:
Aha. well, that is not what I understood. So you want
in nucleotide mode. and you want exactly the aligned region(1125-3824), right?
If we could extract aligned region would be good too, but query nucleotide sequences' blastxed corresponding whole translated amino acid sequences could be extracted even better (for the exemplary sequences, not just 1125-3824 part, but whole 1-4444).
Could that be feasible?