GTF file used for cufflinks should provide you with the "biotype" information
For example: Ensemble Homo_sapiens GRCh37 71 GTF file
11 snRNA exon 10420739 10420864 . - . gene_id "ENSG00000221574"; transcript_id "ENST00000408647"; exon_number "1"; gene_name "U6atac"; gene_biotype "snRNA"; transcript_name "U6atac.23-201"; exon_id "ENSE00001565282";
11 snoRNA exon 10823014 10823155 . - . gene_id "ENSG00000238622"; transcript_id "ENST00000459187"; exon_number "1"; gene_name "SNORD97"; gene_biotype "snoRNA"; transcript_name "SNORD97-201"; exon_id "ENSE00001806941";
Than map each transcript or gene with its biotype.
quick awk script: save it as get.gtf.ensg.biotypes.awk
BEGIN {OFS=FS="\t"}
(substr($1,1,1)!="#" && substr($1,2,1)!="#") {
split($9,format,";");
i=0;
for (i in format){
if (format[i] ~ /gene_biotype|gene_type/){
gsub("gene_biotype | gene_type ", "", format[i]);
gsub(/"/,"",format[i]);
gsub(/gene_id "/,"",format[1]);
gsub(/transcript_id "/,"",format[2]);
gsub(/"/,"",format[1]);
gsub(/"/,"",format[2]);
#print format[1] "\t" format[i];
print format[1] "\t" format[2] "\t" format[i];
}
}
}
Run it:
awk -f get.gtf.ensg.biotypes.awk Homo_sapiens.GRCh37.71.gtf
Sample out put:
ENSG00000210049 ENST00000387314 Mt_tRNA
ENSG00000211459 ENST00000389680 Mt_rRNA
ENSG00000210077 ENST00000387342 Mt_tRNA
ENSG00000210082 ENST00000387347 Mt_rRNA
ENSG00000209082 ENST00000386347 Mt_tRNA
ENSG00000198888 ENST00000361390 protein_coding
ENSG00000198888 ENST00000361390 protein_coding
ENSG00000210100 ENST00000387365 Mt_tRNA
ENSG00000210107 ENST00000387372 Mt_tRNA
ENSG00000210112 ENST00000387377 Mt_tRNA
ENSG00000198763 ENST00000361453 protein_coding
ENSG00000198763 ENST00000361453 protein_coding
.....
Yes, a gene can have both coding and non coding transcripts. It's unlikely that there will be three different "active" transcript types from a single gene, but you can certainly get, for example, protein_coding and processed_transcript. For example: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000121879;r=3:178865902-178957881
This gene has four transcripts, three of which are coding and one is a retained intron.
Okay, I got it. Thanks so much Emily_Ensembl.
@HBR; Biotype resented will corresponding its feature type in that line; can be gene or transcript. FYI: please do reply as comment NOT as an Answer, unless you are answering your question...
Rm - sorry about that, I will make sure to reply. Thanks