Entering edit mode
6.1 years ago
solo7773
▴
90
I'm reading an article in which there is a figure showing how many reads there are around the 5 prime splice site. I want to mimic this plot in my research. I am a newbie in analyzing sequencing data. I Googled but failed to find out how to define the 5 prime splice sites and then write them into a bed file, or where to download this sort of data from a public available data base.
Now what I know is that computeMatrix from deepTools can do this visualization once got the 5 splice sites region information. Can anybody help me out? Many thanks.
One way to do this is, get all the transcripts (CDS) co-ordinates from the gff file. Once you have CDS co-ordinates, align them by 5' region and define flanking bp (eg +/- 500 bp) and write them into bed file.
Thanks Chirag. I noticed gtf files from Genecode, UCSC, RefSeq are different to each other. Currently I stick to the Genecode gtf file. For some gene, it has several transcripts, and each transcript may not include all parts (eg UTR, CDS) that a mature mRNA should have.Do I need to select one isoform from the transcripts of that gene, and also to make sure the selected isoform is a mature mRNA?
I would suggest to take all the isoforms for a given gene. The reason for that is anyway you want to look at overall profile between the conditions. So, If there are any differences, it will be shown up in profile. However, if you want to do the same analysis at gene level, then you can think of considering which isoform to take.
Thanks! Sounds reasonable. I'll take your advice.
Hello, May I please have little bit more details on the tools which can extract the coordinates from gff file and how to align then afterwards
Hi, I use R package
GenomicFeatures
to deal with gff. See example below to get co-ordinates from gff usingGenomicFeatures
. I usedCandida glabrata
gff file for example purpose.genes
is an object of classGRanges
. You can easily convert it into R tibble or data.frame containing gene co-ordinates in separate columns.If you want transcription start site along with flanking, use
promoter
function from GenomicFeatures library. For example, I want to get 500 bp upstream and 500 bp downstream from the transcription start site.Hello, Thanks for your reply and for sharing the codes, that's very clear. I just want to ask how can I get coordinates around 3' splice site instead TSS so what can I use instead of the promoters for the GenomicFeatures.
Thanks
You can use function
flank
. See the example below. I am using data from the same example given in previous reply.