Entering edit mode
7.3 years ago
Bulbul Ahmed
▴
20
I have fasta file in this format (one line)
>accession1 GGGGAGCTACGGCAGCGGCGGCGGGGTGCTGCCGCTGGCGTCGCTTAA
>accession2 TTCCGGTAGAAAATCCATTATTGCCAATGGAAGAAGTGA
How will i convert into the below format(seperate line for sequence) using perl script or any other way
>accession1
GGGGAGCTACGGCAGCGGCGGCGGGGTGCTGCCGCTGGCGTCGCTTAA
>accession2
TTCCGGTAGAAAATCCATTATTGCCAATGGAAGAAGTGA
Substitute tab or space with newline use unix tr
which command should i use in rehat??
Although we can't see which whitespace is between your accession identifier and the actual sequence.
thank so much sir. i will try this, hopefully it will work
Maybe
sed -r 's#\s+#\n#' input >output
then?Bah, I prefer:
So, a different delimiter?
Exactly ;-)
[just some slight Friday night trolling]
Strictly speaking, this is not really bioinformatics.
I don't know... it seems like an awful lot of bioinformatics is just reformatting text files :)
Personally, in this case, I would copy and paste into Notepad++, which allows search/replace of \t for \n. But then I had never seen "tr" before, so I learned something from the thread!
tr
is good, but I use it more for squeezing consecutive white spaces (tr -s
) or for quick deletion (tr -d
) than to replace. I prefersed
for all replace operations as it has better granular control.Then it's not a FASTA. While it's not a bioinformatics question per se, the OP is at least using a file with sequence information.
Yeah, it satisfies that, but really? A find+replace operation?