Question

how to set reporting options for RNA-seq reads alignment with hisat2?

0

Entering edit mode

9.1 years ago

epigene ▴ 590

Hi I'm new to analyzing RNA-seq data. I started with using hisat2 to align RNA-seq reads. I think my main goal is to do differential gene expression analysis comparing multiple control samples vs case samples.

I run through a test run with hisat2 with basically these options: --dta-cufflinks --rna-strandness. I realized that the number of alignments in the bam file is more than the number of reads in the original fastq file. Puzzled by this, I searched around and realize that there is this option -k with a default value of 5. So there could be up to 5 alignments of one read.

-k <int>
It searches for at most <int> distinct, primary alignments for each read. Primary alignments mean alignments whose alignment score is equal or higher than any other alignments.

I think this is the reasons for the number of alignments being more than number of reads.

So I'm curious what would be the ideal -k to use and how this option impact downstream analysis with gene counts etc?

Thanks!

RNA-Seq hisat2 • 2.6k views

ADD COMMENT • link updated 9.1 years ago by WouterDeCoster 48k • written 9.1 years ago by epigene ▴ 590

score 1 · Answer 1 · 2016-08-03

1

Entering edit mode

9.1 years ago

WouterDeCoster 48k

Gene counts will commonly ignore multimapping reads. That's a pitty, but a sensible decision since these cannot properly get attributed to a certain gene. However, you can rescue some using the method specified here: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0734-x

Only if you are confident in what you are doing you should change the default values. If you don't know what the ideal value is, the default is properly just fine. If else it wouldn't be the default.

ADD COMMENT • link 9.1 years ago by WouterDeCoster 48k

0

Entering edit mode

Thanks for the input. I'm not so confident in what I'm doing yet as I don't have a good understanding of what each step works yet. If downstream analysis ignore multimypping reads, then this option won't affect them later.

ADD REPLY • link 9.1 years ago by epigene ▴ 590