Hello everybody! I need some help on my python script. I wrote a script that gets a multiple fasta sequences and do some counting on the nucleotides. Here is my script:
from Bio import SeqIO
#Initializes a list of all the posibile dinucleotides
dinucleotides = ['AA','AT','AC','AG'
'TT','TA','TC','TG'
'CC','CA','CT','CG'
'GG','GA','GT','GC']
def Counting(seq):
"""This function gets a cDNA and returns the frequency of each dinucleotide"""
counting = {} #A dictionary where the dinucleotides are the keys and the counts are the values.
for dinuc in dinucleotides:
for i in range (len(seq)-2):
pos = i+1,i+2
if(seq[i:i+2] == dinuc):
if(counting.get(dinuc,None)!= None): #If the dictionary at the specific dinucleotide is not empty.
counting[dinuc] += 1 #Adds the number of times that the dinucleotide appears.
else:
counting[dinuc] = 1 #initializes the dinucleotide with 1 instance.
return counting
def cDNA(Id,seq): #This function has to be written.
# File path to FASTA file
path_to_file = raw_input("file name: ")
with open(path_to_file, mode='r') as handle:
for record in SeqIO.parse(handle, 'fasta'):
# Extract individual parts of the FASTA record
identifier = record.id
#description = record.description
sequence = record.seq
cDNA(identifier,sequence)
What I want the output to be is a file that contains something like this:
Id length AA AT AC AG
CCE57618 2786 58 450 45 101
CCE57619 1140 12 3 70 98
etc. for all the dinucleotides of all the sequences.
Thanks!!!
CSV is short for "Comma Separated Values". What you specify that you want as output appears to be tab-delimited and right-justified. These are two different formats. Can you clarify your question?
I did so. I don't think that it matters if the values are separated by commas. But I want that in the future it will be possible to read all the lines - line by line.
I often just do it manually for simple CSVs without too many fields, e.g.:
Nice and easy.
So in this case, if you have a number of fasta files to analyse, I personally would write my script to handle 1 sequence, output a single row of data, then
cat
all the rows together after I'd run the script for every fasta file.