Question

Adding tag XS to HISAT2 sam files perior to StringTie

0

Entering edit mode

7.1 years ago

Farbod ★ 3.4k

Dear Biostars, Hi

I have 6 sam files (3 for cond1 and 3 for cond2) produced from HISAT2 from mapping Hiseq2000 RNA-seq data to a newly released draft genome.

Now I want to use StringTie and then proceed for DEG analysis but in the StringTie manual it says: "

Every spliced read alignment (i.e. an alignment across at least one junction) in the input SAM file must contain the tag XS to indicate the genomic strand that produced the RNA from which the read was sequenced. Alignments produced by TopHat and HISAT2 (when run with --dta option) already include this tag, but if you use a different read mapper you should check that this XS tag is included for spliced alignments . "

I did not use "--dta" option but when I checked my sam files there is some XS tags in it (e.g: YS:i:0 YT:Z:CP XS:A:- NH:i:1"

Q: So, what must I do? map all reads to reference from beginning, using HISAT2 and --dta option or . . . ?

NOTE: my mapping script for each paired-end reads :

./hisat2 -p 6 -x ht2_base_salmon_genome -1 '/RNA_Seq_Data/C1_clean_left.fq' -2 '/RNA_Seq_Data/C1_clean_right.fq' -S '/RNA_Seq_Data/C1.sam' &> C1.sam.info"

RNA-Seq alignment hisat2 stringtie samtools • 3.3k views

ADD COMMENT • link updated 7.1 years ago by lakhujanivijay 5.9k • written 7.1 years ago by Farbod ★ 3.4k

score 2 · Answer 1 · 2018-06-30

2

Entering edit mode

7.1 years ago

lakhujanivijay 5.9k

Hi

dta stands for downstream-transcriptome-assembly. Using this options means that you process your alignments to be compatible with transcript assemblers. With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computation and memory usage.

String-tie issues below warning at this link.

NOTE: be sure to run HISAT2 with the --dta option for alignment, or your results will suffer.

I will say use the option dta i.e. map reads once again.

ADD COMMENT • link 7.1 years ago by lakhujanivijay 5.9k

0

Entering edit mode

Dear @Vijay Lakhujani, Hi and thank you. What do you think about my new script?

./hisat2 -p 6 -x  --dta ht2_base_salmon_genome -1 '/RNA_Seq_Data/C1_clean_left.fq' -2 '/RNA_Seq_Data/C1_clean_right.fq' -S '/RNA_Seq_Data/C1.sam' &> C1.sam.info

or I should add "--ss and --exon" to it, too?

ADD REPLY • link 7.1 years ago by Farbod ★ 3.4k