Question

RNASeq gene labeling and mRNA filter from bulkRNA data.

0

Entering edit mode

2.2 years ago

Yeeshouw ▴ 10

Hello,

Currently, I have BAM files sent to me (I have acces to fastq files as well if that is a required data) from a sequencing company, and generated a count matrix using RSubreads package function, featureCounts(). I have also ran DESeq2 through the count matrix and produced a filtered list of significant DEGs. However, I am noticing that there are a fair portion of the DEGs do not correspond to mRNA transcripts.

My question is, is there a way during the alignment process to label the reads being counted. For example, for gene X is a protein coding gene based on a reference annotation, and once the count table is generated, there some meta data column or output denoting the type of gene X. Ultimately I want to be able to select the types of genes being analyzed downstream, e.g. mRNA.

Thank you in advance, Yeeshouw Wang

RNA-Seq RSubreads • 1.7k views

ADD COMMENT • link 2.1 years ago by Yeeshouw ▴ 10

2

Entering edit mode

This information is annotated in GTF files. You can get them for almost every annotated species from Ensembl. There is a column gene_biotype or gene_type that is protein_coding or other types of genes. You can use that for filtering.

ADD REPLY • link 2.2 years ago by ATpoint 88k

0

Entering edit mode

Thank you very much for information! I am able to see this column you mention.

ADD REPLY • link 2.1 years ago by Yeeshouw ▴ 10

score 1 · Answer 1 · 2023-06-03

1

Entering edit mode

2.1 years ago

rfran010 ★ 1.6k

featureCounts can extract this information and output a column for you during counting, presuming you input a GTF file.

extraAttributes I believe

ADD COMMENT • link 2.1 years ago by rfran010 ★ 1.6k

0

Entering edit mode

Yes, I am inputting a GTF file. I reviewed the [featureCounts][1] documentation, and I could not find this extraAttributes parameter. I do see GTF.attrType and GTF.featureType, would one of these be the parameter that you mention? I would suspect it is the GTF,attrType and setting it to "gene_biotype" or "gbkey"?

ADD REPLY • link 2.1 years ago by Yeeshouw ▴ 10

1

Entering edit mode

What version are you running? I use v2.0.3 on the command line.

It looks like this option may be added in a later version, with GTF.attrType.extra https://rdrr.io/bioc/Rsubread/man/featureCounts.html

The paramters you mentioned determine how featureCounts groups reads. e.g. if attrType is exons, it only counts 'exon' lines and featureType is how it groups, so if set to gene_id, it will count all lines with gene_id "ENSG00000245848"; as the same gene.

ADD REPLY • link 2.1 years ago by rfran010 ★ 1.6k

1

Entering edit mode

I see, I believe I was running an older version or have missed the option, I have updated to v2.10.5 and can see this option now. Thank you, I see now what .attrType is for.

I have tried to set GTF.attrType.extra to equal "gbkey," however, the count matrix does not seem to have outputted any extra column or information regarding this parameter. Is the input formatting or name incorrect? I have double checked the GTF file and it does have an information column labeled as "gbkey"

enter image description here

Edit: I have found that this information is stored in the objects $annotation output. Thank you for your help.

ADD REPLY • link 2.1 years ago by Yeeshouw ▴ 10