How to add the misssing bases to a DNA sequence according to the hsp.start, and missing bases
0
0
Entering edit mode
7.1 years ago
bio90029 ▴ 10

Hi, I am working on a python script to curate the DNA sequence of a few genes. I initially perform a local blast, and now I am trying to add the missing bases. I have been trying and changing my script but I cannot get the right solution, and I keeping messing up and getting frustrated. I write below the part of the script that supposedly sort out the gene:

my_seq_1=None
if seq_length==ref_length:
    print 'Sequence length are equal'                               
    extracted_genes=(genes)
    output_handle.write(extracted_genes)
if seq_length!=ref_length:
    contig_files=(SeqIO.index(f + '/contigs.fa', 'fasta'))
    #to add missing bases
    if seq_start>seq_end:
        print  'The sequence is in strand -1 and need to be reversed first'
        if ref_start==1:
            #to calculate the missing bases                                     
            add_bases=int(ref_length) - int(len(contig_files[contig_name].seq[seq_end:seq_start]))
            end=int(seq_end)-int(add_bases)   #this is in reality the beginning of the seq
            my_seq_reverse=contig_files[contig_name].seq[end:seq_start]
            my_seq_1=my_seq_reverse.reverse_complement
            extracted_genes=(genes)
            output_handle.write(extracted_genes)
        if ref_start!=1:
            add_bases_1=int(ref_length) - int(len(contig_files[contig_name].seq[seq_end:seq_start]))
            #the end but the beginning of the reversed seq
            end=int(seq_end)-int(add_bases_1)
            start=int(seq_start)-int(ref_start)+add_bases_1                                                 
            my_seq_reverse_2=contig_files[contig_name].seq[end:start]                                                                               
            my_seq_1=my_seq_reverse_2.reverse_complement()
            extracted_genes=(genes)
            output_handle.write(extracted_genes)                                        
if seq_start<seq_end:
        print 'The sequence is in + strand'
        if ref_start==1: #this part of the script works well
            #only need to add the number of missing bases at the end of the seq
            add_end_2=int(ref_length) - int(len(contig_files[contig_name].seq[seq_start:seq_end]))
            end_2=int(seq_end)+int(add_end_2)
            # needs to start the seq one bases before due to the way python counts.
            my_seq_1=contig_files[contig_name].seq[seq_start-1:end_2-1]
            extracted_genes=(genes)
            output_handle.write(extracted_genes)
            #as the ref_start!=1. I need to add this bases at the start_seq and 
        if ref_start!=1:
            #the number of bases missing to have both seq and ref equal length
            add_end_3=int(ref_length) - int(len(contig_files[contig_name].seq[seq_start:seq_end]))
            # need to add the ref_start to the seq_start. So the seq stars few bases before
            start=int(seq_start)-int(ref_start)
            #need to remove the same number of bases at the end of seq but adding the missing bases.
            end=int(seq_end)-int(ref_start)+int(add_end_3)
            my_seq_1=contig_files[contig_name].seq[start:end]                                       
            extracted_genes= (genes)                                                                                    
            output_handle.write(extracted_genes)
I will be really greatful if someone can point me where are my mistakes as the script is not working as it is supposed to.

Thanks

python biopython • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6