RNA-Seq data Quality Assessment- BoxPlot Interpretation
1
1
Entering edit mode
4.3 years ago
Aynur ▴ 60

Hello,

Please help me with understanding my boxplot.

Here is the boxplot, I got for my RNA-Seq data.

Box Plot

My data is

head(rawCountTable)
                   con-1  con-2    a-1    a-2    b-1    b-2     c-1    c-2    d-1    d-2
ENSMUSG0000000000     0      0      0      0      0      0       0      0      0      0
ENSMUSG00000000028    854    937   1143   1029    912    856    809    754    513    520
ENSMUSG00000000031 822918 817451 716860 691396 763705 829274 838094 819312 717935 730879

The code for Boxplot is below:

pseudoCount = log2(rawCountTable + 1)
df = melt(pseudoCount, variable.name = "Samples", 
      value.name = "count") # reshape the matrix 
df = data.frame(df, Condition = substr(df$Samples, 1, 4))

Here is my code for the density plot.

ggplot(df, aes(x = count, colour = Samples, fill = Samples)) +
  ylim(c(0, 0.17)) +
  geom_density(alpha = 0.2, size = 1.25) +
  facet_wrap(~ Condition) + theme(legend.position = "top") +
  xlab(expression(log[2](count + 1)))

The density Plot is

Density Plot

So, my question is I want to know how to interpret these plots? How is my data quality? If you can recommend me an article about understanding these plots and assess my data, I would appreciate it.

Thank you very much!

alignment sequencing RNA-seq R • 3.3k views
ADD COMMENT
0
Entering edit mode

The image links are broken. Try hosting and embedding them by pressing the image button in the post.

ADD REPLY
0
Entering edit mode

I've fixed it. OP used the embed code in image direct link field.

ADD REPLY
2
Entering edit mode
4.3 years ago

I don't think those plots are necessarily too informative about quality. If you want a general idea about the quality of the sequencing reads, use a program like FastQC. The alignment statistics from your aligner will then give you a good idea of the complexity of your library. If you plan on running differential expression on your data, you can generate PCA and heatmap plots, which will be a good first indicator of replicate concordance, and from those plots you can sometimes start seeing the difference between conditions. The DESeq2 is a good resource for making these plots.

ADD COMMENT
0
Entering edit mode

Alright. I already had my FastQC, and STAR aligning. I was making these plots to see between sample distribution prior to DEG analysis with DESeq2. These plots are mentioned in tutorials, and I am not sure if it is needed or not.
If this is not informing me of anything I should be aware of, then I will continue making PCA, MA plots, and DEG plots. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6