parsing through gtf file
3
0
Entering edit mode
8.2 years ago
a.rex ▴ 350

I have the following gtf file layout, with the 'features' transcript (i.e. full length of the transcript) and the exons within that transcript. For example:

C7123483        cam  transcript         1       8268    .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon              1       206      .       +       .       gene_id "00001"; transcript_id "00001";
C7123483        cam    exon             263     749     .       +       .       gene_id "00001"; transcript_id "00001";

Since this file only contains the coordinates for the exons, I would also like this file to include the intron coordinates. Presumably I would have to subtract the end coordinate of the previous exon from the start coordinate of the next exon. Has anyone got any experience doing this - are there any tools to do this automatically as I am struggling to write a script?

I need to find the exon/intron coordinates as I have another bed file whose coordinates I need to match with the exon/intron/trasncript_id/gene_id information from the gtf file.

I hope this makes sense - I am very new to bioinformatics, and any help would be very much appreciated.

gene • 3.1k views
ADD COMMENT
2
Entering edit mode

Look into bedtools complement. Assuming you have only exons in your files this may work. Then you can use bedtools merge to merge the two files, if you need this information in a single file.

ADD REPLY
0
Entering edit mode

I would suggest you to look into MISO annotation for all the possible intron annotated since it might be that between two exons an intron is not annotated as "intron" but rather can be any potential regulatory sequence (5UTR, 3UTR, snRNA etc not yet annotated..). In order to be an intron you need evidence that it is annotated based on the intron/exon junction (i.e. its expression is dependent on the flanking exons). Have a look here:

https://miso.readthedocs.io/en/fastmiso/annotation.html

furthermore you can use http://rnaseqlib.readthedocs.org/ and make you own annotation

ADD REPLY
2
Entering edit mode
8.2 years ago
Jeffin Rockey ★ 1.3k

Another alternative:

There are a couple of posts giving the usage of -addintrons option of gt gff3 tool from genometools suite. That should indeed be quite useful for you.

For the second requirement,subsequently you may use bedtools intersect as well.

ADD COMMENT
1
Entering edit mode
8.2 years ago
Marge ▴ 320

There are already multiple posts in Biostars that discuss conversion from gtf to bed, e.g.:

How To Convert Gencode Gtf Into Bed Format ?

Converting gtf format to bed format

How To Convert Hg19_Known_Gene From Text Format To Gtf Or Bed?

How to convert GTF format to BED format?

Did you try any of those solutions and in case what is not working for you?

Best, Marge

ADD COMMENT
0
Entering edit mode

These links don't answer a critical part of the original question, which is how to find intervals for the introns and include them in the same file.

ADD REPLY
0
Entering edit mode
8.2 years ago
Marge ▴ 320

Apologies for missing the point. I assumed that getting the full length and exon coordinates in bed format would automatically allow one to find which coordinates are falling in the introns.

ADD COMMENT
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts. SUBMIT ANSWERS should only be used for new answers to original question.

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6