Obtain all ncDNA sequences using a GTF
0
1
Entering edit mode
7.4 years ago
R.Blues ▴ 160

Hi everyone.

I am asking something that may seem trivial, but I would like to know how others would do it.

I need to work with all the non-coding DNA sequences of M. musculus, so my first crucial step is retrieving these sequences.

Let's say I am using the Ensembl GTFs of their most recent genome assemblies (release 89). The features included in such annotation are exon, CDS, stop_codon, five_prime_utr, three_prime_utr, transcript, gene and start_codon.

First of all:

-Would you use this or another annotation? Maybe othes include more information that can be of use.

-My intention is to remove from the GTF all but the exon features, use the Bedtools suite to get the complementary coordinates of these exonic regions (which can be, as far as I know, considered ncDNAs) and then obtain their sequences with getfasta. Is this a good way to proceed? Do you see any issue with this?

Thank you all. Looking forward to your answers! :)

genome sequence • 1.3k views
ADD COMMENT
2
Entering edit mode

What kind of non-coding sequence are you looking for? Ensembl already gives you the ncrna sequences, is there some other reason to go for the non-exonic regions again?

ADD REPLY
1
Entering edit mode

What about alternative splicing? One transcript's exon is (potentially) another one's intron and vice versa. What exactly are you looking for? Do you explicitly need introns or are intergenic regions sufficient?

ADD REPLY
1
Entering edit mode

Save yourself the trouble and get them from RNAcentral (as additional filters, as needed).

ADD REPLY

Login before adding your answer.

Traffic: 2523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6