Hi guys,
I have gene IDs from 5 different species in multiple text files.
species1.fasta, specie2.fasta, specie3.fasta, specie3.fasta, specie4.fasta, specie5.fasta
fileA.txt contains gene ids from some or all of the above species
fileB.txt contains gene ids from some or all of the above species
fileC.txt contains gene ids from some or all of the above species
fileD.txt contains gene ids from some or all of the above species
fileE.txt contains gene ids from some or all of the above species
How to extract the sequences of all gene IDs in fileA.txt, and save it as fileA.fasta?
Then do the same for all .txt files using for loop?
What have you tried?
Also, why is
bioinformatics
the tag you chose? Every question on the forum is related to bioinformatics and there are better, more specific subject matter tags you could choose.Sorry, what tags do you suggest?
There's
fasta
andshell
for starters but finding relevant tags is also an exercise - if you were to think about it /Google a bit, you'd see thatgrep
andawk
are relevant, where you may have stumbled upon bioawk, solving your problem as you typed the question. That has happened to me multiple times, where writing down a problem in a reproducible manner + listing everything I tried reveals something that might work, ultimately solving the problem even before creating the post.Please clarify if the "gene" information is already in headers of
speciesN.fasta
files. Or is this something you need to first find by doing e.g. a blast search.yes, "gene" information is already in headers of speciesN.fasta files
Use bioawk and process the header to extract gene information. You're going to need to search the forum for ideas and previous solutions - this topic has been addressed a ton of times already. This will be a great learning exercise for you; I hope no one gives you a ready-to-use answer and hurts your learning process.