Entering edit mode
2.7 years ago
genomes_and_MGEs
▴
10
Hi everyone,
When I want to calculate the sequence length of fasta nucleotide files, I use
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' file.fasta
However, my new fasta file has special characters in the nucleotide sequence. Besides the nucleotides A, C, G, and T, my file has the special character 'X'. So, I would like to adapt my code to only count the nucleotides A, C, G, and T, or to exclude the special character 'X' from the count. Can someone help me out?
Thanks!
Thanks a lot! Both work. In case I have a multi-fasta nucleotide file, and want the sequence length for the whole fasta file, how can I do to prevent outputting the sequence lenght of each fasta sequence?