Last week I started a course in computing for ecology/evolution and am very new to this. We've been asked to compute the AT/GC ratio within an e.coli FASTA file. However, we have to do this just in the mac terminal, without using shell scripting or python. It has to be done using single line solutions piped together. I've been searching online for hours but I'm still totally lost, can anyone give me any advice? Thank you so much!
There are going to be many ways to do this. Another hint, on top of Mensur's answer, is that AWK's
gsub
function returns the number of matches made. Check up online what its doing, and then try to adapt it to work with your Escherichia coli FASTa file, whose sequence may be spread across multiple lines:You may also have to deal with masked bases, and upper- and lower-case bases.
Another approach using grep :)
grep -v ">" fastafile |grep -o . |sort |uniq -c