I am using GENSCAN to predict the gene from nucleotide sequence:
I am also using python side by side to merge exons which I find from GENSCAN
Python Code:
Promoter = dna1[66:106]
Initial_Exon = dna1[139:259]
Internal_Exon= dna1[520:722]
Terminal_Exon = dna1[806:861]
PlyA_Signal = dna1[960:966]
#print(Promoter)
#print(PlyA_Signal) #consensus: AATAAA
Gene = Initial_Exon+Internal_Exon+Terminal_Exon
print(len(Gene))
Can I say this combination of exons is a gene? Or Should I include promoter and PolyA_Signal in it?
According to Bio star Handbook Definition "Gene is a region (or regions) of DNA that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions, and other functional sequence regions." According to this definition, we must include promoter and polyA signals in the gene
5'-UTRs and 3'UTRs may (and often do) contain regulatory elements, and for them I already told you that they may be included depending on what you are trying to present. Those 2 regions are enough to fit the definition given by BioStars handbook without including promoters.
There may be people who are interested only in the coding sequence for the purposes of translating into proteins, and for them including anything other than CDS part would be noisy. I don't think anyone would accuse you of being inaccurate if including only a coding sequence and calling it a
gene
. By the same token, including these other elements - not the promoter - can be useful as well, as long as they are clearly delineated.Naturally, you are welcome to present this however you want.
Why should I don't include promoter region although its a part of gene as shown by GEN SCAN? Promoter initiates the transcription and turns gene on or off so I think it must be the part of the gene
Not sure why you are taking a combative tone. I already told you my opinion, and that you are welcome to include whatever you want.
Basically this gene confuse me every time. In earlier classes we learnt that gene is something which codes for protein. But now professors and professionals say different definitions. I think, there must be only one standard definition of gene on which we all should agree.