Dear all,I am a freshman in bioinformatics and I am trying to deal with some miRNA data.I have some question not sure. I download the human 3'UTR sequence form the biomart.As a gene have a series of transcripts, I am not sure to use them all or use the shortest utr. Another thing I am not sure is some genes are on the reverse strand ,should I reverse the utr sequences.Thanks for your replies.
Dear IV Really appreciate for your help.I don't know well about miRNA. So I just download some miRNA data sequenced by illumina 2000 from SRA database to learn to deal with as some preparations. Thanks for patient to reply and I have another question, Some genes are on the reverse strand and the UTR sequence are 3'->5' too,Need I to reverse the UTR sequence? Best regards hua.peng
Can you tell me a bit more about what are you trying to accomplish?
We do a lot of miRNA-related analysis in the lab and I might be able to help you more, if I have more info on this.
I'm usually using local genome files but from what I can recall is that ENSEMBL returns the correct UTR sequence regardless of the strand. It automatically reverse-complements the UTR sequence if it's on the (-) strand. However, if you download the coordinates and use the coordinates to get the sequence, then you have to do it yourself.
Cheers
Thanks a lot.I download the UTR sequence from the ENSEMBLE BIOMART and the Header Information is liake this ">ENSG00000001630|ENST00000003100|CYP51A1|91741465|91742978|91741465|91763844" the UTR start is the same with the transcript star,So I think it's just the complements of the UTR sequence but not reverse,Now I know I was wrong .It automatically reverse-complements the UTR sequence.By the way.A kindly friend tells me the transcript with the smallest transcript ID of a gene is the dominant transcript. Is it right? Thanks again for your kindness and patient.It really help me a lot. Best wishs
From what I know (and what the relevant help page in ENSEMBL states: http://www.ensembl.org/Help/View?id=151 ) is that transcript numbers show also the level of curation: Gold transcripts start with 0, Ensembl transcripts start with 2 and Vega/Havana transcripts start with 6. Ensembl suggests to start with the CCDS and gold transcripts and to also crosscheck your trasncripts with EST and other expression data, in order to identify the transcripts expressed in your specific tissue or cell line, since primary transcripts are not constant between different cell types/tissues. ENSEMBL offers a wealth of external identifiers that enable such a task.