Entering edit mode
4.2 years ago
yaghoub.amraei
▴
10
Hello everyone ... I have a cuffcompare output that has the same genes names and transcripts names, but the gene distance of each transcript is different with the same name. How can I get the fastasequence of any genes distance with the Bad Tools or GFF Read package?
1 Cufflinks exon 58474 61195 . + . gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "1"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1 Cufflinks exon 61423 61573 . + . gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "2"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1 Cufflinks exon 61669 61794 . + . gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "3"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1 Cufflinks exon 163041 164107 . + . gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
1 Cufflinks exon 163041 164107 . + . gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
Your GFF file is reporting exons, so for each line you have the genomic range for an exon of a particular gene. Do you want to get the sequence for each individual exon? Or do you want the sequence of the entire gene region on DNA?
my gold is LncRNA detection. The number of each line in the GTF file, of which I want to identify the LncRNA, is about 50,000 transcripts, and when I get the Festa file with GFF, I have about 17,000 transcripts. That is, GFFRID gives the Festa file based on a transcript, not based on the gene distance of each transcript.
getFasta
function from bedtools might help you.hello. amazing, as usual.