I have been trying to understand how to determine the orientation of gene. For example: sequence : gene1 - gene2 gene3 gene4 . In this sequence, gene1,3,4 orientation will head to tail and gene 2 will be tail to head.
In all genome rearrangement scenarios, negative sign represents tail to head , but given nucelotide sequence like in "FASTA file" how can be determine what is head (>) or tail of gene. I need to determine this in order to perform genome rearrangement operations on 2 genomes to transform one into other with similar genes.
is the orientation of gene based on which strand it is present at DNA level (5'-3' or 3'-5')? If so how can we determine it given sequence in like FASTA file
Thanks for all the help!
Orientation would be based on the frame that is coding for the gene/protein. Since DNA is always represented in 5'-->3' orientation one would not be able to determine gene orientations in a single fasta sequence file for the example above. If it was a multi-fasta file then perhaps the fasta headers can be used to indicate strand origin (with a - sign or reverse coordinates)
so, I have multi-fasta file with number of genes and one of the header is as below: lcl|AP006852.1_cds_BAE44532.1_1 [gene=CaJ7.0001] [protein=hypothetical protein] [protein_id=BAE44532.1] [location=complement(97..1155)]
So there should be - sign in the header section of the gene to show orientation?
I found 1 slight difference between the listed genes headers. The word "complement" for location reference. Does this essentially indicate reverse coordinate or - sign
In your file it appears that the strand location is being indicated by location=
complement
(97..1155). This can be a third way of designating strand information besides the two I had mentioned above.This matches what NCBI uses for a GenBank record as you can see from the page here.
Thanks!
So for example:
Overall genome : -gene1 (tail to head) gene2 (head to tail) -gene3(tail to head)
That looks correct.