Hello,
Currently, I have BAM files sent to me (I have acces to fastq files as well if that is a required data) from a sequencing company, and generated a count matrix using RSubreads package function, featureCounts(). I have also ran DESeq2 through the count matrix and produced a filtered list of significant DEGs. However, I am noticing that there are a fair portion of the DEGs do not correspond to mRNA transcripts.
My question is, is there a way during the alignment process to label the reads being counted. For example, for gene X is a protein coding gene based on a reference annotation, and once the count table is generated, there some meta data column or output denoting the type of gene X. Ultimately I want to be able to select the types of genes being analyzed downstream, e.g. mRNA.
Thank you in advance, Yeeshouw Wang
This information is annotated in GTF files. You can get them for almost every annotated species from Ensembl. There is a column
gene_biotype
orgene_type
that isprotein_coding
or other types of genes. You can use that for filtering.Thank you very much for information! I am able to see this column you mention.