Question

How can I be sure that raw read counts are well processed from fastq files?

0

Entering edit mode

3.7 years ago

Simon Ahn ▴ 10

Hi. I'm new in bioinformatics and try to process fastq files for getting raw read count matrix.

I downloaded fastq files from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452

I used fasterq -dump to download fastq files from SRR
Aligned fastq files with ENSEMBL annotation files which are Homo_sapiens.GRCh38.104.chr.gtf & Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa without any trimming
Extracted raw count matrix using featurecounts with BAM files

To check if my results are well processed, I normalized my read count matrix (CPM)

since I could get normalized data matrix from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452.

I compared my data with normalized count data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452,

but the results are quite different than I thought.

I thought that the results would be a little different since I used other tools to get my result, but

when you see some results

enter image description here

left one is my data and right one if from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452 normalized data. When you look at the A1BG gene, for example, there is huge difference between two data. \ What can I do to fix this problem? It seems not reasonable to use same tools everytime I try to extract raw count from fastq.

fastq RNAseq raw-count • 595 views

ADD COMMENT • link 3.7 years ago by Simon Ahn ▴ 10