Get Intron sequence
0
1
Entering edit mode
2.4 years ago
Lillian • 0

Recently I'm studying alternative splicing events on RNA-seq.

I'm wondering if there is a way or tools to retrieve the intron sequence of the skip exon events.

I have the fastq files, and also I have already done SUPPA to get the position of the skip exon events.

Here is the picture of skip exon events: skip exon events

And I'm wondering if I can get the sequence of those intron(pink line!) maybe using getSeq() or other tools?

splicing intron fastq alternative • 1.1k views
ADD COMMENT
0
Entering edit mode

If you load your annotation file as a TxDb object in R, you can use GenomicFeatures::intronsByTranscripts to get the genomic coordinates of all introns, and then can subset this to get the introns of interest.

To get the sequences for those introns use Rsamtools::FaFile to open your genome assembly file, and getSeq to retrieve the sequences for the introns you previously subset.

ADD REPLY
0
Entering edit mode

Hi! Thank you for answering my questions!

I have tried to use the GenomicFeatures::intronsByTranscripts, however, I'm I bit confused about the output.

> txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene #shorthand (for convenience) txdb
> txdb_intron <- intronsByTranscript(txdb)
> txdb_intron
GRangesList object of length 232184:
$`1`
GRanges object with 2 ranges and 0 metadata columns:
      seqnames      ranges strand
         <Rle>   <IRanges>  <Rle>
  [1]     chr1 12228-12612      +
  [2]     chr1 12722-13220      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`2`
GRanges object with 5 ranges and 0 metadata columns:
      seqnames      ranges strand
         <Rle>   <IRanges>  <Rle>
  [1]     chr1 12058-12178      +
  [2]     chr1 12228-12612      +
  [3]     chr1 12698-12974      +
  [4]     chr1 13053-13220      +
  [5]     chr1 13375-13452      +
  -------
  seqinfo: 595 sequences (1 circular) from hg38 genome

$`3`

The output give me a list of Granges objects. I'm wondering what is this list mean, is that in each Granges objects, it shows the intron position in each chromosome? For example, there is an intron sequence in 12228-12612 ?

I'm trying to find the intron sequence with the Skip exon events.

For example, I have an event :

"ENSG00000143194.12;SE:chr1:167005397-167012369:167012511-167016222:+"

and according to SUPPA2:

enter image description here

I know that the skip exon position is here: 167012369:167012511 (I usually use getSeq() to get the exon sequence)

> seqs_up <- getSeq(BSgenome.Hsapiens.UCSC.hg38, gr_up)
> seqs_up
DNAStringSet object of length 289:
      width seq                                                                                                          names               
  [1]   212 ATTTTAGGAGGCCAAGGCAGGAGGATCACTTGGAGCCAGGAGTTTGAGACCAG...AGGATCACAGTGACCTATAATGAGCCGTTGTGCTCCAGCCTGGGCGACAGAA 1
  [2]    91 CTCATCTCTGTATACAGTATGAAGTCCTAGAAGCTGCCCAGCTGGCCATAGAATCCTTGGATGGTATTCTGGTAGATGGTATCTGCATCAA                  2
  [3]   210 CTTGATTCCCTTTTGGCCAGCAGTTTTTAGGTCTGTCAGTACTGCACTGCAAG...AACCGGCAATGAGGGCCGCGTGTCTGTGGAAAACATCAAGCAGCTGTTGCAA 3

And the intron sequence I want to get are in the position 167005397-167012369 and 167012511-167016222.

So if I want to get the sequence should I use these two position to match the intron position in the GenomicFeatures::intronsByTranscripts ?

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2038 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6