Incompatibility between the output of STAR and RSEM
0
0
Entering edit mode
6.9 years ago
wayj86 ▴ 40

Hi all,

I aligned my data with STAR, and then calculated the expression with RSEM:

STAR --runMode alignReads --runThreadN 2 --genomeDir ~/ref_ensemble_ERCC/star_ref/ --readFilesIn ~/WJC-RNAseq/WJC-3/WJC-3_clean_r1.fq  ~/WJC-RNAseq/WJC-3/WJC-3_clean_r2.fq --quantMode TranscriptomeSAM GeneCounts --outSAMtype BAM Unsorted --outSAMreadID Number --outSAMattrIHstart 0 --outSAMattributes All --outFilterMismatchNoverReadLmax 0.05 --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --outFilterMultimapNmax 1 --outReadsUnmapped Fastx --outFileNamePrefix ./WJC-3

rsem-calculate-expression --alignments --paired-end -p 2 --estimate-rspd ~/WJC-RNAseq/WJC-3/WJC-3Aligned.toTranscriptome.out.bam ~/ref_ensemble_ERCC/rsem_ref/ensemble_ERCC ./WJC-3

The "WJC-3ReadsPerGene.out.tab" generated by STAR contains the read counts mapped to genes, while the "WJC-3.genes.results" contains the FPKM values. However, I found Incompatibility between the output of STAR and RSEM:

gene_id read_counts FPKM
ENSG00000275708 44146   0
ENSG00000207442 1263    0
ENSG00000264462 738 0
ENSG00000207616 638 0
ENSG00000201512 425 0
ENSG00000264063 422 0
ENSG00000206688 354 0
ENSG00000269688 315 0
ENSG00000228437 311 0
ENSG00000248347 290 0
ENSG00000210196 279 0
ENSG00000207554 248 0
ENSG00000224574 248 0
ENSG00000258380 246 0
ENSG00000282732 235 0
ENSG00000266897 217 0
ENSG00000240970 215 0
ENSG00000225091 213 0
ENSG00000207549 212 0
ENSG00000269981 206 0
ENSG00000236871 181 0
ENSG00000226942 180 0
ENSG00000243433 180 0
...

As shown in the table above, there were 4776 genes with read_counts > 0 but FPKM = 0; meanwhile there were 800 genes with read_counts n = 0 but FPKM > 0 (data not shown). When using the different expression tools (e.g. DEseq, EdgeR), which require the read counts N, those genes whose read counts > 0 would be regarded as expressed; when using WGCNA, which requires FPKM or TPM values, those genes with read counts > 0 but FPKM = 0 would be regarded as no expression.

So comes the question: how should I get the read counts for DEseq, EdgeR? Extract it from the "ReadsPerGene.out.tab" file? Or recalculated it with featureCounts? Or extract it from the "genes.results" file, which also contains the "expected_count" values?

Thank you very much for your attention.

Stanley

RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Thanks. I have marked my post up. This is the my first time to post a topic in Biostars: )

ADD REPLY

Login before adding your answer.

Traffic: 2264 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6