Finding how many times a nucleotide appear in the same position
1
0
Entering edit mode
3.7 years ago
ran • 0

Hello, I'm new to the world of python and im trying to solve a question which I am given a few dna sequences, for example: sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]

I want to know how many times the nucleotide "A" is in the first position [0] of all of those sequences (the answer should be 1 in that case). Im trying to use for loop but don't really know how to move forward. Ill appreciate any help, Thank you!

Beginer Nucleotide Python DNA • 1.4k views
ADD COMMENT
1
Entering edit mode

Hi, It is not a norm but it is a good practice to post an attempted resolution first and then other members try to correct it or suggest another answer. You said you are a python beginner so don't worry about judgments you can post any attempt no matter how bad it went. ;)

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode
#!/usr/bin/env python3

sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]

def count_first_base(sequences=sequences, base="A"):
    count = [int(i[0] == base) for i in sequences]
    return [count, sum(count)]

print(*count_first_base(), sep="\n")

[0, 0, 0, 1, 0, 0]

1

ADD REPLY
0
Entering edit mode
3.7 years ago
Dunois ★ 2.8k

Here's a little python function you can work off of:

def count_quer_at_pos_in_seq(seqs, quer = "A"):

    #Initialize a list of zeroes, and make it 
    #as long as the longest input sequence.
    #This will be used to store the counts for the 
    #character counts at each position (along the 
    #length of the sequences).
    out = [0]*max([len(seq) for seq in seqs])

    #For each sequence in the list sequences:
    for seq in seqs:

        #For each position in the current sequence:
        for pos in range(len(seq)):

            #Check if the character at the current position 
            #is identical to the query character supplied by 
            #the user.
            if seq[pos] == quer:
                #If it is, increment the count in the list "out" 
                #by one.
                out[pos] += 1

    #Return out to the calling environment.
    return(out)


#----

#Test run.
sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]
count_quer_at_pos_in_seq(sequences, quer = 'A')

#[1, 4, 1, 0, 0, 3, 4, 1, 1, 3, 0, 2, 0]
ADD COMMENT

Login before adding your answer.

Traffic: 3883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6