Question

How to determine Gene orientation head to tail or tail to head

0

Entering edit mode

8.3 years ago

Sruthi • 0

I have been trying to understand how to determine the orientation of gene. For example: sequence : gene1 - gene2 gene3 gene4 . In this sequence, gene1,3,4 orientation will head to tail and gene 2 will be tail to head.

In all genome rearrangement scenarios, negative sign represents tail to head , but given nucelotide sequence like in "FASTA file" how can be determine what is head (>) or tail of gene. I need to determine this in order to perform genome rearrangement operations on 2 genomes to transform one into other with similar genes.

is the orientation of gene based on which strand it is present at DNA level (5'-3' or 3'-5')? If so how can we determine it given sequence in like FASTA file

Thanks for all the help!

gene fasta sequencing genome • 4.3k views

ADD COMMENT • link 8.3 years ago by Sruthi • 0

1

Entering edit mode

Orientation would be based on the frame that is coding for the gene/protein. Since DNA is always represented in 5'-->3' orientation one would not be able to determine gene orientations in a single fasta sequence file for the example above. If it was a multi-fasta file then perhaps the fasta headers can be used to indicate strand origin (with a - sign or reverse coordinates)

ADD REPLY • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

so, I have multi-fasta file with number of genes and one of the header is as below: lcl|AP006852.1_cds_BAE44532.1_1 [gene=CaJ7.0001] [protein=hypothetical protein] [protein_id=BAE44532.1] [location=complement(97..1155)]

So there should be - sign in the header section of the gene to show orientation?

I found 1 slight difference between the listed genes headers. The word "complement" for location reference. Does this essentially indicate reverse coordinate or - sign

lcl|AP006852.1_cds_BAE44533.1_2 [gene=CaJ7.0003] [protein=hypothetical protein] [protein_id=BAE44533.1] [location=complement(1246..2040)] lcl|AP006852.1_cds_BAE44534.1_3 [gene=CaJ7.0004] [protein=hypothetical protein] [protein_id=BAE44534.1] [location=2278..2769]

ADD REPLY • link 8.3 years ago by Sruthi • 0

0

Entering edit mode

In your file it appears that the strand location is being indicated by location=complement(97..1155). This can be a third way of designating strand information besides the two I had mentioned above.

This matches what NCBI uses for a GenBank record as you can see from the page here.

If a feature is located on the complementary strand, the word complement will appear before the base span.

ADD REPLY • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

Thanks!

So for example:

gene1 Location = complement (1....1000) atgctagcatcg.... gene2 Location = (1001....2000) atgctagcatcg.... gene3 Location = complement (2001....3000) atgctagcatcg....

Overall genome : -gene1 (tail to head) gene2 (head to tail) -gene3(tail to head)