Question

Getting Rna Sequences From Gff And Fa Files

0

Entering edit mode

11.3 years ago

MarkR • 0

Hi. I have a folder full of .fa files, and a .gff. The gff file contains information about which loci look like they code for RNA sequences. The .fa contain the DNA sequences for a set of human chromosomes. I want to get all the sequences which code for RNA, as defined by the gff file, out of the DNA in the fasta files. I also have a file telling me which RNA types have higher priority (lincRNA is higher priority than miRNA for example), this tells me which are more important and how I should decided between RNAs for overlapping reads in the gff.

I have been trying to code my own little program in F# that will read these files and give me each RNA read defined in the gff, and its corresponding DNA. However I am a bit confused about how it works. Do the start and end of each feature in the gff file define a character in the corresponding .fa file? Are they 1 or 0 indexed? Does it matter what strand they are ('+' or '-') for my purposes?

Ultimately my goal is to get a bunch of RNAs with their corresponding types (miRNA, lincRNA, snRNA... etc) to do some computations on.

My question is this: what is the easiest way to get it out of the data I have?

The data I am using is freely available here: http://wanglab.pcbi.upenn.edu/coral/ under the heading "Annotation packages" if anyone is interested or needs specifics.

Thank you!

gff rna samtools bedtools • 3.4k views

ADD COMMENT • link updated 11.3 years ago by Malcolm.Cook ★ 1.5k • written 11.3 years ago by MarkR • 0

score 0 · Answer 1 · 2013-08-24

0

Entering edit mode

11.3 years ago

Malcolm.Cook ★ 1.5k

Perhaps my answer to to A: Extract cds fastas from a gff annotation + reference sequence will serve your purposes....???

ADD COMMENT • link 11.3 years ago by Malcolm.Cook ★ 1.5k

0

Entering edit mode

Thank you! But I don't think this will break ties between overlapping reads? Also I would prefer to stick F#. I'm not looking for a pre-baked solution, I want to know the structure of fa and gff files so I can code it myself.

ADD REPLY • link 11.3 years ago by MarkR • 0

0

Entering edit mode

hmmm - your Q said nothing about "reads".... anyway, good luck with the f# ... you might want to reword you q to indicate this is required.

ADD REPLY • link 11.3 years ago by Malcolm.Cook ★ 1.5k

0

Entering edit mode

"I also have a file telling me which RNA types have higher priority (lincRNA is higher priority than miRNA for example), this tells me which are more important and how I should decided between RNAs for overlapping reads in the gff"

ADD REPLY • link 11.3 years ago by MarkR • 0