Hello
I am using STAR
to map the reads of bulk RNA-seq data. My sample should contain some pre-mRNA which contains intronic regions.
I checked the bam
file of the STAR
output in IGV, I found there were no intronic reads in most of the genes.
I am wondering How I should set STAR
parameters to keep all the intronic reads?
I generated the reference using:
STAR --runMode genomeGenerate --runThreadN 16 --genomeDir ./ \
--genomeFastaFiles \
/Users/Shared/reference/FASTA/hg20/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--sjdbGTFfile /Users/Shared/reference/gtf/Homo_sapiens.GRCh38.90.gtf --sjdbOverhang 99
Is my parameter setting correct for mapping the intron region?
STAR --genomeDir /Users/Shared/reference/STAR_Homo_sapiens.GRCh38.90 \
--readFilesCommand gunzip -c \
--outFilterMultimapNmax 5 \
--readFilesIn $1 $2 \
--outFileNamePrefix STAR_one_pass_out/$3"_hg20_" \
--runThreadN 20 \
--outSAMstrandField intronMotif \
--outSAMunmapped Within \
--outFilterIntronMotifs RemoveNoncanonicalUnannotated \
--outSAMattributes All \
--outSAMtype BAM Unsorted \
--quantMode GeneCounts \
--limitSjdbInsertNsj 2000000 \
--outStd BAM_Unsorted | sambamba sort -t 6 -m 10G -o STAR_one_pass_out/$3"_hg20_Aligned.sortedByCoord.out.bam" /dev/stdin
sambamba index -p -t 6 STAR_one_pass_out/$3"_hg20_Aligned.sortedByCoord.out.bam"
Thanks
As a side note, if you use STARsolo for single-cell sequencing, there is a GeneFull mode that does this.