Entering edit mode
11.0 years ago
liux.bio
▴
360
Hello,biostars. I am using bioconductor to get sequences of introns, UTRS, flanking regions for genes. I have a set of genes, I want gene-centered sequences of introns, UTRS, flanking regions. I am using packages such as Biostrings, GenomicFeatures. I gain sequences of UTRS, flanking regions, introns for transcripts of a gene. But I don't know how to delete the overlapping sequences(repetition sequences?). Maybe it's so naive. Any suggestions? Many thanks and Happy New year!
Presuming you have a
GRangesList
containing exons with transcript information split by gene, have you triedreduce()
?Yes,I tried. I read the help but I can't understand it. For a GRangesList split by transcripts, it deletes all the overlapping between the transcripts and in the transcripts,right? I will read the help carefully.Thanks!
Ah, you have things split by transcript. You might instead split things by gene and then use reduce (this can be easily done if you directly import the GFF/GTF file into GenomicRanges, I don't use GenomicFeatures so I can't say how things work there). If you have things split by gene, then what
reduce()
will do is merge overlapping exons between transcripts to create a "union gene model" (i.e., what you would get if you collapsed all of the transcripts together), which sounds like what you want. That way, there are no repeated regions.Edit: I'll add that you can
unlist()
aGRangesList
that's split by transcript and thensplit()
it by gene.