Hi,
I am quite new to bioinformatics. I have a gtf file and also bed file which includes the trascript_name Start End position. For every transcript's start end position, I would like to extract all the exons present between the start and end coordiantes of transcript. For an examples. imagine a gtf file like following
Scaffold1 cuff transcript 344 540 100 + geneid "cuff_45"
Scaffold1 cuff exon 344 400 100 + geneid "cuff_45"
Scaffold1 cuff exon 484 540 100 + geneid "cuff_45"
Scaffold1 cuff transcript 800 1200 100 + geneid "cuff_46"
Scaffold1 cuff exon 800 928 100 + geneid "cuff_46"
Scaffold1 cuff exon 980 1100 100 + geneid "cuff_46"
Scaffold1 cuff exon 1100 1200 100 + geneid "cuff_46"
Scaffold2 cuff transcript 1 500 1000 - gene_id "cuff_47"
Scaffold2 cuff exon 1 500 1000 - gene_id "cuff_47"
and a bed file like following
Scaffold1 344 540
Then I would like extract entries of Scaffold1 and its exons from gtf file like following
Scaffold1 cuff transcript 344 540 100 + geneid "cuff_45"
Scaffold1 cuff exon 344 400 100 + geneid "cuff_45"
Scaffold1 cuff exon 484 540 100 + geneid "cuff_45"
Can someone suggest any tool to achieve my goal.
Thanks in advance.
Use unix utility:
grep "Scaffold1" your.gtf > scaf1.gtf
Do you also want a BED file from
your.gtf
or you already have that?Thanks for your reply. I already tried with unix but it also incldes all the entries have Scaffold1 but I need to extract entries between the transcript start and end. I have updated the gtf file in my question. Kindly take a look and guide me
There are 7 lines that have
Scaffold1
in your example.grep
above would get all 7. So instead of all 7, you just want the lines that match the interval in your BED file?Exactly. I just need need to extartc all the exons confined within the transcript start and end like mentioned in the above.