Entering edit mode
8.3 years ago
sslee1015
•
0
Hi, I'm fairly new to RNA Seq, and I don't really know how to explain these results.
Here is a generic sample of my HiSAT2 code:
hisat2 -x mm10/genome -1 sample1_R1.fastq -2 sample1_R2.fastq -S sample1out.sam
The genome reference I used is mouse, mm10, and the directory contains .ht2 files. sample1_R1.fastq is one of the paired end reads, and sample1_R2.fastq is the other. For sample 1, I received 8 different fastq files, 4 of them from R1 and the other 4 R2, so I concatenated the R1's and R2's into the fastq files I input into hisat2. This was my hisat2 summary:
32832172 reads; of these:
32832172 (100.00%) were paired; of these:
32326312 (98.46%) aligned concordantly 0 times
393332 (1.20%) aligned concordantly exactly 1 time
112528 (0.34%) aligned concordantly >1 times
----
32326312 pairs aligned concordantly 0 times; of these:
6101 (0.02%) aligned discordantly 1 time
----
32320211 pairs aligned 0 times concordantly or discordantly; of these:
64640422 mates make up the pairs; of these:
64313845 (99.49%) aligned 0 times
208508 (0.32%) aligned exactly 1 time
118069 (0.18%) aligned >1 times
2.06% overall alignment rate
2.06% seems really low. Did I do something wrong?
Something should be terribly wrong. 32 million read pairs and 0.39 million mapped ? Post the fastqc report.
The sample might not be what you think it is, so you may be aligning to the wrong genome. Try blasting a few of the unmapped reads.
Fastq Screen should provide you with a quick and easy way of telling what genome a sequence file comes from.
Here are screenshots of most of the FastQC graphs
You have some serious problems at the 5' end of your reads. The first 3 bases are 100% GC and 70% G. A disturbed GC profile pretty normal at the start of an RNA-seq read, but I've never seen anything this extreme before. If you look at your enriched Kmers, you'll see a massive enrichment for all the different homo-polymer runs, in particular homo-G at the start.
You could try clipping off the first 10 bases of the read or so, and see if that helps, but i'd be a bit nervous because you don't know the cause. There is definitely either something wrong with the libraries, or something wrong with the sequencing. I would contact your sequencing company and discuss it with them.
Did you trim of adapters and polyA tails? Which sequencer? What is your read length?
I believe the company that gave us the raw reads did that for me already.