Hi,
I have generated the raw read counts for genes from RNA-seq data using htseq-count. Now I want to separate the this table into coding RNAs and non-coding RNAs.
I am new to the NGS data analysis.
Can anyone help me or suggest me ideas how to do it?
Thank you,
Naresh
What you mean by coding and non-coding RNAs? Do you mean separating counts for coding and non-coding transcripts ? Or do you mean separating counts for coding (exonic) and non-coding (intronic, UTRs) regions for a given transcript?
@Ashutosh Pandey, yes I want to separate the counts for coding and non-coding transcripts.
For separation of coding and non-coding regions there is a tool RSeQC.
Well RSeQC will give you the numbers or fractions of reads aligned to different genic features but it won't separate them. Anyways, what you need is the annotation of transcripts (genes) based on their biotypes. If these are ENSEMBL genes or gene IDs then you can use Biomart (http://www.ensembl.org/biomart) to download the "Biotype" for each gene and then annotate ENSEMBL genes in the count file as protein-coding, rRNA, tRNA, snoRNA, miRNA etc.
Thank you. I will try your suggestion and let you know.