Entering edit mode
7.1 years ago
bio90029
▴
10
Hi, I am working on a python script to curate the DNA sequence of a few genes. I initially perform a local blast, and now I am trying to add the missing bases. I have been trying and changing my script but I cannot get the right solution, and I keeping messing up and getting frustrated. I write below the part of the script that supposedly sort out the gene:
my_seq_1=None
if seq_length==ref_length:
print 'Sequence length are equal'
extracted_genes=(genes)
output_handle.write(extracted_genes)
if seq_length!=ref_length:
contig_files=(SeqIO.index(f + '/contigs.fa', 'fasta'))
#to add missing bases
if seq_start>seq_end:
print 'The sequence is in strand -1 and need to be reversed first'
if ref_start==1:
#to calculate the missing bases
add_bases=int(ref_length) - int(len(contig_files[contig_name].seq[seq_end:seq_start]))
end=int(seq_end)-int(add_bases) #this is in reality the beginning of the seq
my_seq_reverse=contig_files[contig_name].seq[end:seq_start]
my_seq_1=my_seq_reverse.reverse_complement
extracted_genes=(genes)
output_handle.write(extracted_genes)
if ref_start!=1:
add_bases_1=int(ref_length) - int(len(contig_files[contig_name].seq[seq_end:seq_start]))
#the end but the beginning of the reversed seq
end=int(seq_end)-int(add_bases_1)
start=int(seq_start)-int(ref_start)+add_bases_1
my_seq_reverse_2=contig_files[contig_name].seq[end:start]
my_seq_1=my_seq_reverse_2.reverse_complement()
extracted_genes=(genes)
output_handle.write(extracted_genes)
if seq_start<seq_end:
print 'The sequence is in + strand'
if ref_start==1: #this part of the script works well
#only need to add the number of missing bases at the end of the seq
add_end_2=int(ref_length) - int(len(contig_files[contig_name].seq[seq_start:seq_end]))
end_2=int(seq_end)+int(add_end_2)
# needs to start the seq one bases before due to the way python counts.
my_seq_1=contig_files[contig_name].seq[seq_start-1:end_2-1]
extracted_genes=(genes)
output_handle.write(extracted_genes)
#as the ref_start!=1. I need to add this bases at the start_seq and
if ref_start!=1:
#the number of bases missing to have both seq and ref equal length
add_end_3=int(ref_length) - int(len(contig_files[contig_name].seq[seq_start:seq_end]))
# need to add the ref_start to the seq_start. So the seq stars few bases before
start=int(seq_start)-int(ref_start)
#need to remove the same number of bases at the end of seq but adding the missing bases.
end=int(seq_end)-int(ref_start)+int(add_end_3)
my_seq_1=contig_files[contig_name].seq[start:end]
extracted_genes= (genes)
output_handle.write(extracted_genes)
I will be really greatful if someone can point me where are my mistakes as the script is not working as it is supposed to.
Thanks