Alignment Of Split Orfs
1
2
Entering edit mode
14.1 years ago
Liam Thompson ▴ 140

Hi everyone

I have all the genomic sequences of Hepatitis B Virus listed in Genbank. I am trying to reannotate the four genes (X, polymerase, surface, precore-core) myself as the genbank records are often incorrectly annotated. I have extracted the type sequences of the four genes from all the type isolates using Biopython, and am trying to align them against all the genomic sequences so that I can extract the positions and reannotate my little database using Biopython again.

My problem lies with the splitting of the polymerase and precore-core genes. These two mostly start near the end or middle of the genomic sequences, and carry on from the beginning. So when I try to align the type sequences with the genomic, I only get half the gene. The programs I've tried from EMBOSS (needle, water, wordfinder) are obviously not geared towards circular type genomes. The genes are also in different reading frames, and all the genes overlap with one or more of the others, so I am doing this one gene at a time.

I was hoping someone could suggest a program or alternative method of reannotating a batch of sequences based on a type sequence. I have used the Genome Annotation Transfer Utility, but as far as I can see, this only does one sequence at a time, obviously not a good choice for 2500 genomic sequences.

Thanks Liam

multiple biopython split orf • 2.5k views
ADD COMMENT
3
Entering edit mode
14.1 years ago
Ketil 4.1k

Why can't you simply concatenate the genome with itself, and use that as the "genome" for aligning your genes? This will of course get you two copies of everything, but the genes spanning the breakpoint should show up in one piece.

ADD COMMENT
0
Entering edit mode

hmm, yes, good idea. It does seem to work and there is not too much extra coding to filter out the extra sequence.

ADD REPLY
0
Entering edit mode

Neat workaround Ketil, I love this sort of solutions.

ADD REPLY

Login before adding your answer.

Traffic: 2497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6