I want to use the HISAT-Stringtie approach to quantify expression for my paired-end Illumina RNA-seq data.
I see that by default HISAT reports up to 5 alignments for each read. “Default mode: search for one or more alignments, report each. k -5”
If there are multiple alignments for a given read in the sam file, how does Stringtie use this to quantify expression? Does this mean that reads can be counted multiple times or does Stringtie somehow pick the best alignment and ignore the others?
I want to avoid the situation where a read is counted multiply therefore how should I filter the sam file to only include one alignment for a read as Tophat used to report? I see I cannot just specify –k 1 as “HISAT does not "find" alignments in any specific order, so for reads that have more than N distinct, valid alignments, HISAT does not gaurantee that the N alignments reported are the best possible in terms of alignment score.” How else should I do this?
I also noticed in the Stringtie manual 'Every spliced read alignment (i.e. an alignment across at least one junction) in the input SAM file must contain the tag XS to indicate the genomic strand that produced the RNA from which the read was sequenced. Alignments produced by TopHat and HISAT2 (when ran with --dta option) already include this tag'
When using HIAST I want to make sure the XS tag is present. However, I cannot see the --dta option?
Thanks!
Alison
Have you read the StringTie paper?
Hi, Thanks for your comment! Yes I have read the paper. But it doesn't answer these questions, mainly because in the paper they use the output of Tophat2 as the input of StringTie. Tophat2 only reports one alignment per reads.