Entering edit mode
4.5 years ago
clarkanna150
•
0
Is there a quick way to convert fasta formats into text files? I need to convert whole genome sequences into .txt files for some software I am using, so need to remove scaffold assignments, so that the structure is the species name, followed by the entire sequence on "one line".
You're describing a custom plain text format, possibly space or tab separated with the species in the first field and the whole sequence in the second field, correct?
Where would you get the species name from?
Yes that is correct. Pref space delimited, with species name deriving from the 'filename' that the sequence is from.
Use this code courtesy of @Pierre. You can then
cat
the linearized files together. Change\t
to space if you need that :Linearize a fasta sequence
awk -f linearizefasta.awk < input.fa
or
Format back to fasta
if you know your fasta header have a length < 60