Entering edit mode
7.6 years ago
Tim Padvitski
•
0
Hi all,
Can you, please help me with the following issue:
I have a bed file with coordinates of all introns from the mouse genome (based on GENCODE annotation) and
I want to obtain a fasta file that contains a full intronic sequence for each transcript - i.e. for each transcript separate introns are concatenated in the right order.
Do you have an idea how it can be done? I couldn't find appropriate command in bedtools.
Thank you in advance, Regards, Tim
This sounds like a job for bedtools getfasta to get the introns.
The concatenating is less "common" and might require a custom script.
EDIT: getfasta has a
-split
argument to concatenate blocks from bed12 format.Thanks! I've actually tried bedtools getfasta, I was just wondering if there is ready solution for my task, but it seems it's easier to write a script .
I was just editing my comment when you reacted, how about the
-split
argument?I thinks that may work, but how do I get bed12 for all introns?
the problem is, i don't know how to get bed12 for introns...
I guess you need to google for a gtf of introns, I found this one as a first hit: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/VYBl3k3IX4I
Then you could look into converting gtf to bed12.
It's maybe not straightforward, but neither is what you are trying to achieve with an unconventional tool for that job :p
I would just map to the genome and use htseq count or featureCounts for counting per intron (using a bed file)
Thanks, I've seen that post but I haven't though about making bed12 from gtf that was made from bed %) but that might actually work.
Regarding the mapping, I did exactly what you've said but from that point transcript quantification is problematic. I should think about the problem, maybe I stick to the gene level for now.
Thanks, I've seen that post but I haven't though about making bed12 from gtf that was made from bed %) but that might actually work.
Regarding the mapping, I did exactly what you've said but from that point transcript quantification is problematic. I should think about the problem, maybe I stick to the gene level for now.
I've forgotten to mention, I want to use the resulting fasta file to generate indexes for Kallisto, so I can quantify transcripts using exclusively reads originating from intronic regions (or from intron-exon junction).
Hello Tim,
are you sure that Kallisto is the correct tool for your purposes? It seems to me like an alignment to the genome would be more preferable in your case, since reads originating from intron-exon junctions will not map properly to the transcriptome.
Hello Stefanos,
Thanks for your reply. I have ribo-minus RNAseq data from 2 conditions from a specific cell-type. What I want to do is to compare expression of mature and pre-mRNA reads in my conditions to separate transcriptional and post-transcriptional regulation.
The idea is not new and described here, for example: http://www.nature.com/nbt/journal/v33/n7/full/nbt.3269.html Eventually I will integrate this data with Chip-seq on transcription factors (TF) from the same cell-type, and I want to test if levels of nascent RNAs can be better explained by TF binding than total expression lvls.
Frankly, I don't know whether kallisto is the best (or even suitable) tool for my purpose. I think I should ask this question in the kallisto group.
Regards