How to determine Gene orientation head to tail or tail to head
0
0
Entering edit mode
8.4 years ago
Sruthi • 0

I have been trying to understand how to determine the orientation of gene. For example: sequence : gene1 - gene2 gene3 gene4 . In this sequence, gene1,3,4 orientation will head to tail and gene 2 will be tail to head.

In all genome rearrangement scenarios, negative sign represents tail to head , but given nucelotide sequence like in "FASTA file" how can be determine what is head (>) or tail of gene. I need to determine this in order to perform genome rearrangement operations on 2 genomes to transform one into other with similar genes.

is the orientation of gene based on which strand it is present at DNA level (5'-3' or 3'-5')? If so how can we determine it given sequence in like FASTA file

Thanks for all the help!

gene fasta sequencing genome • 4.3k views
ADD COMMENT
1
Entering edit mode

Orientation would be based on the frame that is coding for the gene/protein. Since DNA is always represented in 5'-->3' orientation one would not be able to determine gene orientations in a single fasta sequence file for the example above. If it was a multi-fasta file then perhaps the fasta headers can be used to indicate strand origin (with a - sign or reverse coordinates)

ADD REPLY
0
Entering edit mode

so, I have multi-fasta file with number of genes and one of the header is as below: lcl|AP006852.1_cds_BAE44532.1_1 [gene=CaJ7.0001] [protein=hypothetical protein] [protein_id=BAE44532.1] [location=complement(97..1155)]

So there should be - sign in the header section of the gene to show orientation?

I found 1 slight difference between the listed genes headers. The word "complement" for location reference. Does this essentially indicate reverse coordinate or - sign

lcl|AP006852.1_cds_BAE44533.1_2 [gene=CaJ7.0003] [protein=hypothetical protein] [protein_id=BAE44533.1] [location=complement(1246..2040)] lcl|AP006852.1_cds_BAE44534.1_3 [gene=CaJ7.0004] [protein=hypothetical protein] [protein_id=BAE44534.1] [location=2278..2769]

ADD REPLY
0
Entering edit mode

In your file it appears that the strand location is being indicated by location=complement(97..1155). This can be a third way of designating strand information besides the two I had mentioned above.

This matches what NCBI uses for a GenBank record as you can see from the page here.

If a feature is located on the complementary strand, the word complement will appear before the base span.

ADD REPLY
0
Entering edit mode

Thanks!

So for example:

gene1 Location = complement (1....1000) atgctagcatcg.... gene2 Location = (1001....2000) atgctagcatcg.... gene3 Location = complement (2001....3000) atgctagcatcg....

Overall genome : -gene1 (tail to head) gene2 (head to tail) -gene3(tail to head)

ADD REPLY
1
Entering edit mode

That looks correct.

ADD REPLY

Login before adding your answer.

Traffic: 2774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6