Hi I'm new to analyzing RNA-seq data. I started with using hisat2 to align RNA-seq reads. I think my main goal is to do differential gene expression analysis comparing multiple control samples vs case samples.
I run through a test run with hisat2 with basically these options: --dta-cufflinks --rna-strandness. I realized that the number of alignments in the bam file is more than the number of reads in the original fastq file. Puzzled by this, I searched around and realize that there is this option -k with a default value of 5. So there could be up to 5 alignments of one read.
-k <int>
It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.
I think this is the reasons for the number of alignments being more than number of reads.
So I'm curious what would be the ideal -k to use and how this option impact downstream analysis with gene counts etc?
Thanks!
Thanks for the input. I'm not so confident in what I'm doing yet as I don't have a good understanding of what each step works yet. If downstream analysis ignore multimypping reads, then this option won't affect them later.