I am using hisat2 for RNA seq data i have some queries --
1) There is option --phred33
- Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines.
Is this means Ididn't have to run trimmomatic and using this score for alignment the selection quality of reads will be phred 33 ? Please explain.
2) Command I am using --
hisat2 -p 24 --dta --known-splicesite-infile splicesites.txt --phred33 -x hg38 -1 31_1.fastq.gz -2 /media31_2.fastq.gz -S H031.sam --summary-file H031_summary.txt --novel-splicesite-outfile H031_outfile --novel-splicesite-infile H031_infile
In H031_outfile file my getting result like
1 15037 15795 -
Anyone can explain me how to interpret this result and this result is totally different form tophat junction file or its gives us appropriate information.How can I get the information of reads skipped?
Thankyou for explaination. For Second question I know the defination of result ouput of command --novel-splicesite-outfile but i did not give the number of reads that contain the junction as tophat give in junction file thats why i ask that How to interpret these result. Further more I used regtools get the results as in tophat format but i am not satisfy because by comparing the position of chromosome from output file of hisat2 and regtools is totally different .
Any suggestions?
The information you're looking for isn't present in the file you're looking in. If the novel sites are that disparate then it's likely that neither tool should be relied upon. Have a look at STAR, which generally performs better in comparisons for finding novel splice sites.