Question

Hisat2 splicesite file vs Tophat

0

Entering edit mode

5.6 years ago

shrutidabral ▴ 10

I am using hisat2 for RNA seq data i have some queries --

1) There is option --phred33 - Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines.

Is this means Ididn't have to run trimmomatic and using this score for alignment the selection quality of reads will be phred 33 ? Please explain.

2) Command I am using --

hisat2 -p 24 --dta --known-splicesite-infile splicesites.txt  --phred33  -x hg38  -1 31_1.fastq.gz -2 /media31_2.fastq.gz -S H031.sam --summary-file H031_summary.txt --novel-splicesite-outfile H031_outfile --novel-splicesite-infile H031_infile

In H031_outfile file my getting result like

1   15037   15795   -

Anyone can explain me how to interpret this result and this result is totally different form tophat junction file or its gives us appropriate information.How can I get the information of reads skipped?

Assembly RNA-Seq • 1.7k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 5.6 years ago by shrutidabral ▴ 10

score 0 · Answer 1 · 2019-05-10

0

Entering edit mode

5.6 years ago

Bastien Hervé 5.9k

1) No the phred33 option is use by default and will not filter out any reads based on their quality. It is a table cotation for score, on old sequencing machine phred64 table where used to report scores. Nowadays, sequencing machine use phred33 score table. I suggest you to not use this option at all if you get reads from recent machine.

See also : http://drive5.com/usearch/manual/quality_score.html

2) The result of --novel-splicesite-outfile option is explained in the HISAT2 manual

chromosome name <tab> genomic position of the flanking base on the left side of an intron <tab> genomic position of the flanking base on the right <tab> strand (+, -, and .) '.' indicates an unknown strand for non-canonical splice sites.

ADD COMMENT • link 5.6 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thankyou for explaination. For Second question I know the defination of result ouput of command --novel-splicesite-outfile but i did not give the number of reads that contain the junction as tophat give in junction file thats why i ask that How to interpret these result. Further more I used regtools get the results as in tophat format but i am not satisfy because by comparing the position of chromosome from output file of hisat2 and regtools is totally different .

Any suggestions?

ADD REPLY • link 5.5 years ago by shrutidabral ▴ 10

0

Entering edit mode

The information you're looking for isn't present in the file you're looking in. If the novel sites are that disparate then it's likely that neither tool should be relied upon. Have a look at STAR, which generally performs better in comparisons for finding novel splice sites.

ADD REPLY • link 5.5 years ago by Devon Ryan 104k