extract fasta sequence based on gene distance
0
0
Entering edit mode
4.2 years ago

Hello everyone ... I have a cuffcompare output that has the same genes names and transcripts names, but the gene distance of each transcript is different with the same name. How can I get the fastasequence of any genes distance with the Bad Tools or GFF Read package?

1   Cufflinks   exon    58474   61195   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "1"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    61423   61573   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "2"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    61669   61794   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "3"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    163041  164107  .   +   .   gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
1   Cufflinks   exon    163041  164107  .   +   .   gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
sequence assembly • 967 views
ADD COMMENT
0
Entering edit mode

Your GFF file is reporting exons, so for each line you have the genomic range for an exon of a particular gene. Do you want to get the sequence for each individual exon? Or do you want the sequence of the entire gene region on DNA?

ADD REPLY
0
Entering edit mode

my gold is LncRNA detection. The number of each line in the GTF file, of which I want to identify the LncRNA, is about 50,000 transcripts, and when I get the Festa file with GFF, I have about 17,000 transcripts. That is, GFFRID gives the Festa file based on a transcript, not based on the gene distance of each transcript.

ADD REPLY
0
Entering edit mode

getFasta function from bedtools might help you.

ADD REPLY
0
Entering edit mode

hello. amazing, as usual.

ADD REPLY

Login before adding your answer.

Traffic: 1908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6