I'm not sure my Bulk RNAseq read counts extracted from fastq file are correct
2
0
Entering edit mode
3.1 years ago
Simon Ahn ▴ 10

Hi. I'm new in bioinformatics and I'm trying to extract read counts from fastq files.

I used STAR alignment method with GENCODE annotation files.

(I didn't trimmed by reads because I heard that trimming is an option)

Then, I used featurecounts to get my read count matrix.

However, my counts are different from original read count provided by paper researchers.

They used Bowtie2 and TopHat to extract read count.

Now, i'm confused because there seems no standard extraction method for bulk RNAseq.

People use a lot of tools for trimming, alignment, and getting count matrix.

Which data should I trust? or How can I be sure my data is reliable?

fastq RNAseq raw-count • 1.5k views
ADD COMMENT
0
Entering edit mode

Well, how different? Are most of the gene counts within 5% of each other?

ADD REPLY
0
Entering edit mode

more than 10 times much

ADD REPLY
0
Entering edit mode

Request the manuscript authors for scripts used in analysis explaining why you would need those scripts. Authors will be happy to furnish. If not, contact publishers. Authors are supposed to furnish the scripts used in analysis.

ADD REPLY
3
Entering edit mode
3.1 years ago

Without knowing the exact context of all this I would assume there is nothing wrong with your data. Like in many (all?) other fields/analyses, if you use different tools or approaches you will get a different result. However, that does not mean one is more correct than another. Yes, the reads counts themself will differ but it all depends a bit on which "level" you compare the outputs. It is very well possible that in the end, eg. after doing differential gene expression analysis, you will get the same (or roughly the same) list of DEGs. If so then yes, the reads counts differ but the biological end goal remains somewhat consistent.

ADD COMMENT
3
Entering edit mode
3.1 years ago
husensofteng ▴ 410

I think it is best to focus on the QC output plots and decide based on that if the input data is ok.

I would recommend you to run nf-core/rna-seq pipeline that is very handy and provides nice QC viz in a single HTML file.

ADD COMMENT

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6