Question

I'm not sure my Bulk RNAseq read counts extracted from fastq file are correct

0

Entering edit mode

3.1 years ago

Simon Ahn ▴ 10

Hi. I'm new in bioinformatics and I'm trying to extract read counts from fastq files.

I used STAR alignment method with GENCODE annotation files.

(I didn't trimmed by reads because I heard that trimming is an option)

Then, I used featurecounts to get my read count matrix.

However, my counts are different from original read count provided by paper researchers.

They used Bowtie2 and TopHat to extract read count.

Now, i'm confused because there seems no standard extraction method for bulk RNAseq.

People use a lot of tools for trimming, alignment, and getting count matrix.

Which data should I trust? or How can I be sure my data is reliable?

fastq RNAseq raw-count • 1.5k views

ADD COMMENT • link updated 3.1 years ago by cpad0112 21k • written 3.1 years ago by Simon Ahn ▴ 10

0

Entering edit mode

Well, how different? Are most of the gene counts within 5% of each other?

ADD REPLY • link 3.1 years ago by swbarnes2 14k

0

Entering edit mode

more than 10 times much

ADD REPLY • link 3.1 years ago by Simon Ahn ▴ 10

0

Entering edit mode

Request the manuscript authors for scripts used in analysis explaining why you would need those scripts. Authors will be happy to furnish. If not, contact publishers. Authors are supposed to furnish the scripts used in analysis.

ADD REPLY • link 3.1 years ago by cpad0112 21k

score 3 · Accepted Answer · 2021-10-19

Without knowing the exact context of all this I would assume there is nothing wrong with your data. Like in many (all?) other fields/analyses, if you use different tools or approaches you will get a different result. However, that does not mean one is more correct than another. Yes, the reads counts themself will differ but it all depends a bit on which "level" you compare the outputs. It is very well possible that in the end, eg. after doing differential gene expression analysis, you will get the same (or roughly the same) list of DEGs. If so then yes, the reads counts differ but the biological end goal remains somewhat consistent.

score 3 · Accepted Answer · 2021-10-19

3

Entering edit mode

3.1 years ago

husensofteng ▴ 410

I think it is best to focus on the QC output plots and decide based on that if the input data is ok.

I would recommend you to run nf-core/rna-seq pipeline that is very handy and provides nice QC viz in a single HTML file.

ADD COMMENT • link 3.1 years ago by husensofteng ▴ 410