I have a fasta file containing millions of sequences and I want a simple script to convert this file into one long sequence. ie delete all headers and remove any spaces and line breaks. I can always add a ">seq_name" to the first line afterwards, so maintaining the top header is not necessary.
I've searched the forums but can only find scripts that do the reverse. I'm using millions of reads as a substitute for a complete genome, and my current pipeline cannot reconcile this, so I want to trick it into thinking that this is one long genome sequence.
Thanks for any help!!!
Home work?
Haha not homework. Actual work done being attempted by a below-average programmer (me).
This appears to work perfectly, thanks!
This removes all line breaks as well.
Hi Guys I also want to remove the breaks in a multiline FASTA file. But I can't. Can anyone clarify for me . I am vary new to Bioinformatics. Thanks in Advance
using seqkit:
Please move your post to a new post and try any one/all of the solutions provided above, before posting.