Question

HISAT - Stringtie

1

Entering edit mode

8.5 years ago

AW ▴ 350

I want to use the HISAT-Stringtie approach to quantify expression for my paired-end Illumina RNA-seq data.

I see that by default HISAT reports up to 5 alignments for each read. “Default mode: search for one or more alignments, report each. k -5”

If there are multiple alignments for a given read in the sam file, how does Stringtie use this to quantify expression? Does this mean that reads can be counted multiple times or does Stringtie somehow pick the best alignment and ignore the others?

I want to avoid the situation where a read is counted multiply therefore how should I filter the sam file to only include one alignment for a read as Tophat used to report? I see I cannot just specify –k 1 as “HISAT does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT does not gaurantee that the N alignments reported are the best possible in terms of alignment score.” How else should I do this?

I also noticed in the Stringtie manual 'Every spliced read alignment (i.e. an alignment across at least one junction) in the input SAM file must contain the tag XS to indicate the genomic strand that produced the RNA from which the read was sequenced. Alignments produced by TopHat and HISAT2 (when ran with --dta option) already include this tag'

When using HIAST I want to make sure the XS tag is present. However, I cannot see the --dta option?

Thanks!

Alison

RNA-Seq • 5.0k views

ADD COMMENT • link 8.5 years ago by AW ▴ 350

1

Entering edit mode

Have you read the StringTie paper?

ADD REPLY • link 8.5 years ago by Devon Ryan 104k

1

Entering edit mode

Hi, Thanks for your comment! Yes I have read the paper. But it doesn't answer these questions, mainly because in the paper they use the output of Tophat2 as the input of StringTie. Tophat2 only reports one alignment per reads.

ADD REPLY • link 8.5 years ago by AW ▴ 350

score 1 · Answer 1 · 2016-05-19

1

Entering edit mode

8.5 years ago

AW ▴ 350

Found the answer to the second question. I was using HISAT but StringTie developers recommend using HISAT2. In HISAT2 there is an option --dta/--downstream-transcriptome-assembly Report alignments tailored for transcript assemblers including StringTie. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computationa and memory usage.

Its still not clear about the multiple alignments reported by HISAT2?

ADD COMMENT • link 8.5 years ago by AW ▴ 350

1

Entering edit mode

Normally for quantification cufflinks used an EM approach, so the presumption is that stringTie follows the same procedure. Whether that's actually the case only the authors can answer.

ADD REPLY • link 8.5 years ago by Devon Ryan 104k