Question

Check whether we have RNAseq raw counts

0

Entering edit mode

4.6 years ago

mi_kappa • 0

Hi,

In a R dataset (in assays) there are counts and normalized counts stored. Is it safe to assume that counts are the raw counts? If not, is there a way that someone can check whether they have the raw counts?

RNA-Seq R • 1.5k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 85k • written 4.6 years ago by mi_kappa • 0

0

Entering edit mode

Have you checked to see if you have integers only or real numbers? Raw counts should generally be whole integers.

ADD REPLY • link 4.6 years ago by GenoMax 147k

0

Entering edit mode

This would be integers right?

ENSG00000000419 29534 23742 24648 18752 16204

ADD REPLY • link 4.6 years ago by mi_kappa • 0

0

Entering edit mode

This is indeed integers but there are tools like tximport which aggregate transcript level abundances to raw gene level counts which are floats. Any background on these data, how it was produced?

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

I did check whether the medians align in the counts matrix and indeed it doesn't.

The quality of the raw reads was checked using FastQC, the adaptors were clipped using cutadapt, and Sickle was used to trim low-quality ends from the reads. Read alignment was performed using STAR and mapping statistics from the BAM files were acquired through SAMtools flagstat. The expression on the gene, exon, exon ratio and poly(A) ratio levels used Ensembl v.71 annotation. The reason why I am asking is because after running DESEeq on what I believed to be is raw counts I get some very weird results. I know that DESEq requires raw counts and I wanted to make sure whether there is an issue with the counts matrix. I am using DESeqDataSetFromMatrix do you think from what i described I should be using something else?

Note: there are no outliers in the dataset and I have pre-filtered low count genes.

ADD REPLY • link 4.6 years ago by mi_kappa • 0

0

Entering edit mode

Please forget about what I previously said about the medians, I did not properly test this suggestion so it might not be correct. Do you have access to the raw fastq files? If so why not simply rerunning everything to be sure?

ADD REPLY • link 4.6 years ago by ATpoint 85k

0

Entering edit mode

Thank you for your help! I do not have access to the FASTA files. I have been given access to what is called counts and norm.counts in the assays because I am performing differential gene expression analysis. For that I am using the counts (assuming it is raw counts). Do you by any chance know if DESeq would give me an error or a warning if I was not using raw counts?

ADD REPLY • link 4.6 years ago by mi_kappa • 0

1

Entering edit mode

It is always best to ask the source of data for clarification in cases such as this because a wrong assumption can ruin your work.

Based on your description it does sound like counts are raw. You can test by normalizing them in DESeq2. Why are you filtering low counts? Use the data as is (which is what the original person may have done).

ADD REPLY • link 4.6 years ago by GenoMax 147k

0

Entering edit mode

I am waiting to get clarifications, I was just wondering whether there was a way to check myself. Pre-filtering of low count genes is an optional step before you run DESeq. I am only removing rows with no reads or or nearly no reads (0 or 1) to reduce memory size of the dds object and increase the speed and the transformation and testing of the functions within DEseq.

ADD REPLY • link 4.6 years ago by mi_kappa • 0