Hi,
I have paired end R1.fastq, R2.fastq and singletons.fastq files for the same samples. What parameters should I use for aligning reads against a genome of interest?
1) hisat2 -1 R1.fastq, -2 R2.fastq, -U singletons.fastq
or just treat all the files as unpaired and use
2) hisat2 -U R1.fastq, R2.fastq, singletons.fastq
Tophat had the options of using paired end and single reads together but unsure how hisat2 would work taking into account both paired end and single reads from the same samples. I will be using the bam files generated from hisat2 for downstream differential expression analysis using stringtie.
Should one ignore the singleton reads and just use the paired-end reads (use only -1, -2 and not -U) or Should treat all reads as unpaired and use -U R1.fastq, R2.fastq, singletons.fastq
How would stringtie calculate the counts of transcripts generated from hisat2 using different parameters as mentioned above?
(I had a look at the counts tables generated by 1 and 2 and I get completely different results. For example using method 1, I get a high number of counts of certain transcripts in sampleX and using method 2, I get no counts of the same transcripts in the same sample and sometimes the resulting counts are vice-versa in different samples.)
I found a similar post here but couldn't find a definitive conclusion: https://github.com/feltus/OSG-GEM/issues/10
Any advice will be appreciated, thanks.
Estimation of the insert size is a great idea. I will follow that.
I have read the manual of stringtie but couldn't find an answer if it assembles transcripts based on both paired end and singletons reads and the counts generated will also include singletons. In short, does stringtie takes singletons into consideration for transcript assembly and estimation of counts?
Many thanks for your reply!
I am quite sure that it uses every resource available, it would be stupid otherwise. :) Regarding what I said on the insert size estimation, on a second thought there is some clarification to be made: