Hi folks,
I would like to ask if any of you have experienced similar problems and what is the best way to handle it? I have 9 transcriptome (paired-end) samples, which I trimmed Q<20 and short reads (<35 bases) out. I then used HISAT2 to map the samples to a genome (command below) before using featurecounts to summarize mapped reads (command below).
The HISAT2 stats were OK (average 80% for 8 samples, except one is lower at 65%). However, when I ran featureCounts, the sample with 65% mapping rate returned only 1% of successfully assigned alignment. The others were between 67%-72%, so suffice to say the problem was only with this sample. I thought probably the lab did something wrong, so I tried featureCounts with -s 1 (the library was forwardly stranded although it was prepared with TruSeq stranded, which should be -s 2) and the successfully assigned alignment rate went up to 44%. And I got the same result when I treated the sample as not strand-specific (ignored -s).
At this point, I am not sure how to deal with this sample. Would it be OK to treat this particular sample as non strand-specific sample or shall I just discard it or would it be any smarter way to handle this issue? Thanks a lot in advance for your suggestion. :)
The commands I used for HISAT2 and featureCounts are below:
hisat2 -q -x /home/ck/Desktop/EAB/BATG05/BATG05 --rna-strandness RF --phred33 -p 12 -1 sample_1.fastq.gz -2 sample1_2.fastq.gz -S sample1_hisat2.bam
featureCounts -p -t exon -g gene_id -T 12 -a BATG05/F_excelsior_38873_TGAC_v2_AGAT.gtf -o counts_sample1_hisat2_bam.txt sample1_hisat2.bam -B --countReadPairs -s 2
If only one of the samples from this group (which I assume were collected, treated and made into libraries at the same time) is showing an issue then it has failed in some way. If the rest of the samples behave as expected (forward strand lib) then your can't treat this one sample as something else/non-stranded.
Take a look at PCA plot to see if you are able to expected separation for the samples that are behaving well.