Quick help: How can I find and replace a specific nucleotide within a gene sequence?
1
0
Entering edit mode
6.9 years ago
mokunf • 0

So I have a

usr/bin/python
from collections import defaultdict
import re, sys, random
from Bio import SeqI

- - - - - H E A D E R - - - - - - - - - - - - - - - - -

Objectives:
1. Read in a sequence
2. Find a specific segment of that sequence
3. Change a letter (mutation)
4. Output the sequence with the mutation

- - - - - U S E R V A R I A B L E S - - - - - - - -

mssg = " Search and Destroy"
genFile  = 'P1.txt'
inFile   = 'P1.txt'
inFolder = '.'
site   = ""  # what the mutation is 
outFile  = "Project1-Out.txt"
GenSeqs = defaultdict(lambda: "my own unknown" )
CT        = defaultdict(lambda: 'nada')
GeneSeqs   = defaultdict(lambda: 'nada')
CX  = defaultdict(lambda: 0)
count = defaultdict(lambda: 0)
NTs    = ['a', 't', 'g', 'c']
afreq        = defaultdict(lambda: -1.1 )
tfreq        = defaultdict(lambda: -1.1 )
gfreq        = defaultdict(lambda: -1.1 )
cfreq       = defaultdict(lambda: -1.1 )
seq = 'P1.txt'

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

- - - - - M A I N - - - - - - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

print("\n\n", mssg, ". . . . ")

IN1 = open( "P1.txt", 'r')
number = 1
for line in IN1:
    if (re.match('>', line)):
        header = line.rstrip()   # remove right white space
    else:
        GenSeqs[header] = line.rstrip()   # Dict[key] = value, key = header, value = sequence
        number += 0

print("There are %d gene sequences in file %s" % (number, inFile))


for records in SeqIO.parse ('P1.txt','fasta'):
    if 'a' in records.seq:
        print (records)

RNA-Seq gene sequence genome • 2.2k views
ADD COMMENT
5
Entering edit mode

To answer your question, you can use CRISPR.

ADD REPLY
1
Entering edit mode

I prefer CRISPR with vim bindings.

ADD REPLY
2
Entering edit mode

I'm a big fan of cRispr, crispy, and crispl

ADD REPLY
1
Entering edit mode

Format your code properly so we can easily read it and be more specific with your question and you'll get a lot more help.

ADD REPLY
0
Entering edit mode

I have tidied it a fair bit to how [I believe..] it should look

ADD REPLY
1
Entering edit mode

Now we need to figure out what the actual question is. So much for quick help...

ADD REPLY
0
Entering edit mode

It does not appear that you are searching for a pattern but replacing in a specific location. Do you need to use a python program for this (unless this is an assignment).

ADD REPLY
0
Entering edit mode
6.9 years ago
mokunf • 0

I apologize for not responding sooner. I'm not sure how to enter the script that I've created so far. Here it is again, with the sequence that i'm trying to analyze. Below the code is the sequence that I'm focusing on. I've installed Biopython and it seems to be helpful in my analysis. I could be mistaken as I am still new to this. Any help that's offered is greatly appreciated.

#!/usr/bin/python
 input_file = open('P1.txt', 'r')
OUT = open('P1-Out.txt','w') 
OUT.write('Gene_Name\tA\tC\tG\tT\tTotal_Length\tCG%\n') 
from Bio import SeqIO
from Bio import IUPAC

 print ("......... START........... ")

for cur_record in SeqIO.parse(input_file, "fasta") :
#count nucleotides in this record...
    gene_name = cur_record.name 
   A_count = cur_record.seq.count('a') 
   C_count = cur_record.seq.count('c') 
   G_count = cur_record.seq.count('g') 
   T_count = cur_record.seq.count('t') 
   length = len(cur_record.seq) 
   cg_percentage = 100 * float(C_count + G_count) / length

print ("A_Count %d" % A_count)
print ("C_Count %d" % C_count)
print ("G_Count %d" % G_count)
print ("T_Count %d" % T_count)
print ("CG_Percentage %d" % cg_percentage)

cur_seq = SeqIO('input_file', IUPAC.unambiguous_gene)
cur_seq.alphabet

counter = 0
for cur_record in SeqIO.parse(input_file, "fasta"):
counter+= cur_record.seq.count ('a')


print ("............. FINISH .............")
OUT.write('%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % (gene_name, A_count, C_count, G_count, T_count, length, 
       cg_percentage))
 OUT.close()

Example sequence

>reverse translation of sp|P68871|HBB_HUMAN Hemoglobin subunit beta OS=Homo sapiens GN=HBB PE=1 SV=2 to a 441 base sequence of most likely codons.
atggtgcatctgaccccggaagaaaaaagcgcggtgaccgcgctgtggggcaaagtgaac
gtggatgaagtgggcggcgaagcgctgggccgcctgctggtggtgtatccgtggacccag
cgcttttttgaaagctttggcgatctgagcaccccggatgcggtgatgggcaacccgaaa
gtgaaagcgcatggcaaaaaagtgctgggcgcgtttagcgatggcctggcgcatctggat
aacctgaaaggcacctttgcgaccctgagcgaactgcattgcgataaactgcatgtggat
ccggaaaactttcgcctgctgggcaacgtgctggtgtgcgtgctggcgcatcattttggc
aaagaatttaccccgccggtgcaggcggcgtatcagaaagtggtggcgggcgtggcgaac
gcgctggcgcataaatatcat
ADD COMMENT
0
Entering edit mode

How do you specify which location from this sequence needs to be changed? Do you always know the length of input sequence in advance or is this location relative to something or is it based on a motif/pattern?

ADD REPLY

Login before adding your answer.

Traffic: 1707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6