Entering edit mode
17 months ago
wyt1995
▴
40
Hello, I am a novice graduate student conducting RNA-seq analysis. The following are the results obtained using the STAR code.
# STAR
STAR \
--readFilesIn run_clean_1.fastq.gz run_clean_2.fastq.gz \
--genomeDir STAR_genomeGenerate \
--readFilesCommand zcat \
--runThreadN 10 \
--twopassMode Basic \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.1 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--outFilterType BySJout \
--outFilterScoreMinOverLread 0.33 \
--outFilterMatchNminOverLread 0.33 \
--limitSjdbInsertNsj 1200000 \
--outFileNamePrefix STAR/run \
--outSAMstrandField intronMotif \
--outFilterIntronMotifs None \
--alignSoftClipAtReferenceEnds Yes \
--quantMode TranscriptomeSAM GeneCounts \
--outSAMtype BAM Unsorted \
--outSAMunmapped Within \
--genomeLoad NoSharedMemory \
--chimSegmentMin 15 \
--chimJunctionOverhangMin 15 \
--chimOutType Junctions SeparateSAMold WithinBAM SoftClip \
--chimOutJunctionFormat 1 \
--chimMainSegmentMultNmax 1 \
--outSAMattributes NH HI AS nM NM ch
done
# Output
# run.Aligned.out.bam
# run.Aligned.toTranscriptome.out.bam
# run.Chimeric.out.junction
# run.Chimeric.out.sam
# run.Log.final.out
# run.Log.out
# run.Log.progress.out
# run.ReadsPereGene.out.tab
# run.SJ.out.tab
Among these results, I would like to use DESeq to identify DEGs (differentially expressed genes) and I understand that ReadsPerGene is commonly used for that purpose. I am curious if the remaining files, such as Aligned.out.bam and Aligned.toTranscriptome.out.bam, are not necessary for my analysis.
Not really, unless you are counting your reads into features (again). The bam files are just the results of mapping your reads to your reference, they have other uses (e.g. visualisation of expression within a gene body, variant calling, etc.) but for differential gene expression analysis is not really necessary :)
Oh, thanks. If possible, could you provide me with a website or give a brief explanation about the possible analysis of the remaining files? I apologize for being shameless, but I would appreciate your assistance.
IGV is a tools you can use to visualize your reeds: https://software.broadinstitute.org/software/igv/BAM
or maybe something like this: https://expert.cheekyscientist.com/how-to-do-variant-calling-from-rnaseq-ngs-data/