Entering edit mode
2.6 years ago
jaqx008
▴
110
Hi all, I download an RNA seq data for gene expression analysis from NCBI. When I did counts with bedtools for my gene of interest, the variability was too much. Does that mean this data is unreliable? What could have caused this? and Does that mean I cant discard the bad data and use the ones I feel are closer from the same research group?
See part of data as example before normalization.
Female_rep1 Female_rep2 Male_rep1 Male_rep2
gene1 149 11 125 30
gene2 108 122 388 68
gene3 18 30 393 44
gene4 170 91 1270 179
gene5 86 3 176 2
gene 254 311 898 215
Thanks
You must mean biological replicates. Technical reps would be the same library sequenced multiple times. Almost no one does that since it is not needed for Illumina sequencing
Would you advice I continue working with this data? based on the difference in gene expression from the above replicates?
You know that bedtools is really not the software of choice for generating gene counts?
I have always used the multicov option. Could you emntion the one you think is best?
Most people use FeatureCounts or RSEM
I've used
bedtools coverage
before for gene expression analysis, but it's important to know that these are raw counts and not normalized by sequencing depth (FPM/TPM).I would look into making a correlation matrix or heat map or PCA plot to see how your samples cluster before moving forward with any differential gene expression analysis.
Ok. thanks for your recommendation.