Entering edit mode
3 months ago
Emy Alade
•
0
I am analyzing 3' RNA-seq single-end (65 bp) data for gene expression analysis, with 3 replicates for the control group and 3 replicates for the treated group. I performed trimming with Cutadapt, alignment with STAR, and gene expression quantification with RSEM.
Here is my pipeline:
STAR --genomeDir $reference_dir \
--sjdbGTFfile $reference_GTF \
--runThreadN 20 \
--readFilesIn $file \
--outFileNamePrefix $base \
--twopassMode Basic \
--sjdbOverhang 64 \
--outSAMtype BAM SortedByCoordinate \
--outBAMsortingThreadN 20 \
--readFilesCommand zcat \
--outSAMunmapped Within \
--outFilterType BySJout \
--quantMode TranscriptomeSAM
rsem-calculate-expression -p 20 --bam --no-bam-output \
$alignedToTranscriptomeFile \
$reference_RSEM \
$resultats_RSEM
After running the pipeline, I identified 4 genes with differential expression (padj < 0.05 . However, when I checked the reads aligned to these genes, there were not many reads mapped. How is this possible
Row.names log2FoldChange lfcSE stat pvalue padj
ENSG00000183742 -3.34239155250741 0.521491417693734 -6.40929349765515 1.46195516837849E-10 7.5933951445579E-07
Visually, the peaks in the first three are both higher and broader. So that supports the overexpression call (assuming its treatment - control, because it seems higher in control, hence lower in treatment). Note that these tracks are not normalized for total read count, so they have limited value judging your results. What's exactly the issue?
Okay, I understand that I need to normalize the BAM files in order to make accurate comparisons on IGV