Hi
I have a fasta file started by
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
I want a fasta which is a one line character string; just keep the nucleotides characters like
Basically I should remove anything that is not T, C, G, A or N. After replacing any such characters with "N"
I have tried this but gives an empty file
cat input_fasta.fa | sed -r 's/[RYKMSWBVHD]/N/g' > output_fasta.fa
Can you help me?
Thank you so much
input:
output:
Linearize your fasta file using @Pierre's code (which you can easily find by searching for "linearize fasta", should be first hit). Then remove the first column to leave just the sequence.