Entering edit mode
9.1 years ago
super
▴
60
Hi All
When I run gffread to get transcripts from Tophat output
cufflinks -p 8 -u -g reference_genome_genes.gtf -o outdir accepted_hits.bam
cd outdir
gffread -w transcripts.fa -g reference_genome.fa transcripts.gtf
I got the transcripts.fa.
>CUFF.5657.1 gene=CUFF.5657
>ENSGALT00000015891 gene=CUFF.5841
>CUFF.5841.1 gene=CUFF.5841
>CUFF.5844.1 gene=CUFF.5844
>CUFF.5848.1 gene=CUFF.5848
>CUFF.5841.3 gene=CUFF.5841
>CUFF.5851.1 gene=CUFF.5851
>ENSGALT00000015914 gene=CUFF.5729
>ENSGALT00000015903 gene=ENSGALG00000009778
>ENSGALT00000015896 gene=ENSGALG00000009775
If I open the transcripts.fa, the format is :
$head transcripts.gtf
10 Cufflinks transcript 13726 13952 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; FPKM "2.7667603544"; frac "1.000000"; conf_lo "1.287865"; conf_hi "4.245656"; cov "15.974601"; full_read_support "yes";
10 Cufflinks exon 13726 13952 1000 - . gene_id "CUFF.1"; transcript_id "CUFF.1.1"; exon_number "1"; FPKM "2.7667603544"; frac "1.000000"; conf_lo "1.287865"; conf_hi "4.245656"; cov "15.974601";
10 Cufflinks transcript 14653 14823 1000 + . gene_id "CUFF.2"; transcript_id "CUFF.2.1"; FPKM "11.2588291433"; frac "1.000000"; conf_lo "7.992090"; conf_hi "14.525568"; cov "71.347614"; full_read_support "yes";
However, lots of identifiers are CUFF* and the gene= is Ensembl ID. I know we could use biomart to translate it (e.g. ENSGALG00000009778) into gene symbol.
(1) any other option could directly get the gene= gene symbol (e.g. MX2, RSAD2) instead?
(2) What is CUFF**? Is that novel gene/transcripts founded by Cufflinks ?
Thanks!