How to compute the AT/GC ratio for an e.coli sequence in a fasta file?
1
0
Entering edit mode
5.2 years ago
botinky • 0

Last week I started a course in computing for ecology/evolution and am very new to this. We've been asked to compute the AT/GC ratio within an e.coli FASTA file. However, we have to do this just in the mac terminal, without using shell scripting or python. It has to be done using single line solutions piped together. I've been searching online for hours but I'm still totally lost, can anyone give me any advice? Thank you so much!

sequencing fasta • 1.1k views
ADD COMMENT
1
Entering edit mode

There are going to be many ways to do this. Another hint, on top of Mensur's answer, is that AWK's gsub function returns the number of matches made. Check up online what its doing, and then try to adapt it to work with your Escherichia coli FASTa file, whose sequence may be spread across multiple lines:

echo "CCGCATGCAAGCTAGCTGACTGACTGACTGACTAGCTATGC" | \
  awk '{
    print "A bases:\t"gsub(/A/,"",$0);
    print "T bases:\t"gsub(/T/,"",$0);
    print "G bases:\t"gsub(/G/,"",$0);
    print "C bases:\t"gsub(/C/,"",$0)}'

A bases:    10
T bases:    9
G bases:    10
C bases:    12

You may also have to deal with masked bases, and upper- and lower-case bases.

ADD REPLY
1
Entering edit mode

Another approach using grep :)

grep -v ">" fastafile |grep -o . |sort |uniq -c

ADD REPLY
1
Entering edit mode
5.2 years ago
Mensur Dlakic ★ 28k

Count how many times As and Ts appear in your FASTa file (excluding first line) and divide the sum of those two numbers by total number of all ACGT letters. Maybe this search will help. You can try using tr or sed instead of awk.

ADD COMMENT

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6