Question

STAR pair-end gene count discrepancy vs. RSEM

0

Entering edit mode

6 months ago

cropero • 0

I have run the mapping with STAR for a PE data with the following code:

STAR --runThreadN 20 \ --genomeDir /genome_index150 \ --readFilesCommand zcat \ --readFilesIn "$file" "$file2" \ --outFileNamePrefix "$OUTPUT_DIR$base_name" \ --outSAMtype BAM SortedByCoordinate \ --sjdbOverhang 149 \ --alignIntronMax 500 \ --outReadsUnmapped Fastx \ --outFilterMismatchNoverLmax 0.05 \ --quantMode GeneCounts TranscriptomeSAM GeneCounts

I get the following number of overlapping genes (blue) when I also run the --quantMode GeneCounts (as indicated before):

enter image description here

However, when I run RSEM for counting using the transcriptome BAM files from STAR I get a total number of mapped reads that correspond aprox. to half the uniquely mapped reads with STAR default htseq-count. enter image description here

This is the code used for RSEM:

rsem-calculate-expression --bam --no-bam-output --paired-end --forward-prob 0 --estimate-rspd --append-names\
        "$bam_file" \
        "$rsem_index" \
        "$sample_output_dir/$base_name_trimmed" \
        -p "$num_threads"

theoretically the RSEM output should be equivalent to the uniquely overlapping reads from htseeq-count, right? What is happing while counting with STAR geneCounts parameter?

star genecount rsem htseq illumina • 339 views

ADD COMMENT • link 6 months ago by cropero • 0