Question

How to splice introne sequences for each transcript.

0

Entering edit mode

7.5 years ago

Tim Padvitski • 0

Hi all,

Can you, please help me with the following issue: I have a bed file with coordinates of all introns from the mouse genome (based on GENCODE annotation) and
I want to obtain a fasta file that contains a full intronic sequence for each transcript - i.e. for each transcript separate introns are concatenated in the right order.

Do you have an idea how it can be done? I couldn't find appropriate command in bedtools.

Thank you in advance, Regards, Tim

sequence intron genome bedtools fasta • 2.5k views

ADD COMMENT • link updated 7.5 years ago by WouterDeCoster 47k • written 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

This sounds like a job for bedtools getfasta to get the introns.
The concatenating is less "common" and might require a custom script.

EDIT: getfasta has a -split argument to concatenate blocks from bed12 format.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks! I've actually tried bedtools getfasta, I was just wondering if there is ready solution for my task, but it seems it's easier to write a script .

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

I was just editing my comment when you reacted, how about the -split argument?

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I thinks that may work, but how do I get bed12 for all introns?

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

the problem is, i don't know how to get bed12 for introns...

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

I guess you need to google for a gtf of introns, I found this one as a first hit: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/VYBl3k3IX4I

Then you could look into converting gtf to bed12.

It's maybe not straightforward, but neither is what you are trying to achieve with an unconventional tool for that job :p

I would just map to the genome and use htseq count or featureCounts for counting per intron (using a bed file)

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, I've seen that post but I haven't though about making bed12 from gtf that was made from bed %) but that might actually work.

Regarding the mapping, I did exactly what you've said but from that point transcript quantification is problematic. I should think about the problem, maybe I stick to the gene level for now.

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

Thanks, I've seen that post but I haven't though about making bed12 from gtf that was made from bed %) but that might actually work.

Regarding the mapping, I did exactly what you've said but from that point transcript quantification is problematic. I should think about the problem, maybe I stick to the gene level for now.

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

I've forgotten to mention, I want to use the resulting fasta file to generate indexes for Kallisto, so I can quantify transcripts using exclusively reads originating from intronic regions (or from intron-exon junction).

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0

0

Entering edit mode

Hello Tim,

are you sure that Kallisto is the correct tool for your purposes? It seems to me like an alignment to the genome would be more preferable in your case, since reads originating from intron-exon junctions will not map properly to the transcriptome.

ADD REPLY • link 7.5 years ago by stefanos.bamopoulos ▴ 40

0

Entering edit mode

Hello Stefanos,

Thanks for your reply. I have ribo-minus RNAseq data from 2 conditions from a specific cell-type. What I want to do is to compare expression of mature and pre-mRNA reads in my conditions to separate transcriptional and post-transcriptional regulation.
The idea is not new and described here, for example: http://www.nature.com/nbt/journal/v33/n7/full/nbt.3269.html Eventually I will integrate this data with Chip-seq on transcription factors (TF) from the same cell-type, and I want to test if levels of nascent RNAs can be better explained by TF binding than total expression lvls.

Frankly, I don't know whether kallisto is the best (or even suitable) tool for my purpose. I think I should ask this question in the kallisto group.

Regards

ADD REPLY • link 7.5 years ago by Tim Padvitski • 0