Hi I have been trying to extract a specific gene sequence from a batch of multiple genome files. Could anyone help with this pls.?
Hi I have been trying to extract a specific gene sequence from a batch of multiple genome files. Could anyone help with this pls.?
If you know the gene id you can use seqkit grep.
Supposing you have all your fasta files in the same directory and that they are not linearized (sequence in more than one line):
cat *.fasta | awk 'NR==1 {print ; next} {printf /^>/ ? "\n"$0"\n" : $1} END {printf "\n"}' | grep -A1 ACC > ACC.fasta
With cat you concatente all the files, then with awk you liniarize the sequences and then with grep you just retain your target accession and the following line, that means its sequence. Hope it helps.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Please add example data and desired output. What did you try and where are the problems?
Hi, For exapmle, i have a gene X in a fasta format and I wanted to blast this gene against a bunch of genome fastas and retrieve the matches from each genome. Most of the examples given are for searching genes using IDs. In my case i do not have a ID but have the gene sequence in a file instead.
Hi all, I tried a way of using makeblastdb
Accordingly ran
Then did blastdbcmd using
However, getting the error as follows
It is still unclear what you are trying to achive? Based on this latest post your title is not quite describing the desired outcome:
How to extract a fasta sequence from a batch of multiple fasta genome files
. It appears that you are interested in a specific gene sequence , either from databases or from your own genome sequences. Is that correct?What are the contents of AI1198.fasta? E. coli genomes?
blastdbcmd
retrieves full sequence present in your blast database. If there are genomes in your database then it is going to retrieve entire genomes. Not just the gene sequence you are interested in.If you need to get
gyrA
genes from E. coli genomes then:Result by taxon
and select E. coli.gyrA
gene from all entries you could just download directly after the search.If you have genomes and need the gene sequence from those, then you will need to take
gyrA
representative gene sequence. Blastn or blat against your genomes of interest and then parse the blast results to retrieve sequneces you need.Hi, Yes you are right. " It appears that you are interested in a specific gene sequence , either from databases or from your own genome sequences. Is that correct?" --- this is what I am trying to achieve.
"If you have genomes are need the gene sequence from, then you will need to take gyrA representative gene sequence. Blastn or blat against your genomes of interest and then parse the blast results to retrieve sequneces you need." -- I will try this to retrieve gene of interest from my genomes, and will post the results.
Thanks.
Please edit the original question and add this information there. Also update the title of this post to reflect that you are looking for sequences of a specific gene from genome data you have.