Hi dear all, I have several confused result for my RNAseq data alignment and analysis. I used Hisat2 to map my Bacillus 168 RNAseq data into downloaded genome (fasta) from NCBI. I found all none of alignment is concordantly. When I used this output to counting the number of read through HTseq, the report said I only got not-aligned read. I changed several parameters, but it didn't improve. Could you help me please? I will write down my scripts and process. Thank you very much.
1. I downloaded genome from NCBI and used Hisat2-building to make genome index.
module load HISAT2/2.1.0-intel-2017A-Python-2.7.12
pe_1="genomereference"
genome_index_prefix='/scratch/user/guotingfeng/RNAseq/genomereference/sequence.fasta'
**hisat2-build -p $threads -f -c $genome_index_prefix $pe_1**
Then I got these 8 files:
genomereference.1.ht2
genomereference.2.ht2
genomereference.3.ht2
genomereference.4.ht2
genomereference.5.ht2
genomereference.6.ht2
genomereference.7.ht2
genomereference.8.ht2
2. I used my fastq.zg data and genome index to do the mapping through Hisat2 and samtools.
module load HISAT2/2.1.0-intel-2017A-Python-2.7.12
module load SAMtools/1.6-GCCcore-6.3.0
pe_1='/scratch/user/guotingfeng/RNAseq/1Mg_S5_L002_R1_001.fastq.gz'
pe_2='/scratch/user/guotingfeng/RNAseq/1Mg_S5_L002_R2_001.fastq.gz'
genome_index_prefix='/scratch/user/guotingfeng/RNAseq/genomereference/genomereference'
id='1Mggroup'
library='bacillus'
platform='ILLUMINA'
sample='1Mg'
output_bam="/scratch/user/guotingfeng/RNAseq/Hisat2/${sample}_pe_aln.sam"
**hisat2 --dta -p $threads --rg-id "$id" --rg "LB:$library" --rg "SM:$sample" --rg "PL:$platform" -x $genome_index_prefix -q -1 $pe_1 -2 $pe_2 -S out.sam**
**samtools view -h -bS out.sam | samtools sort -m 2G -@ $threads -T $sample -n -o $output_bam**
rm out.sam
The output report is wird here:
**25854795 reads; of these:**
25854795 (100.00%) were paired; of these:
**25854795 (100.00%) aligned concordantly 0 times,**
0 (0.00%) aligned concordantly exactly 1 time,
0 (0.00%) aligned concordantly >1 times.**
----
25854795 pairs aligned concordantly 0 times; of these:
0 (0.00%) aligned discordantly 1 time,
----
25854795 pairs aligned 0 times concordantly or discordantly; of these:
51709590 mates make up the pairs; of these:
51709590 (100.00%) aligned 0 times,
0 (0.00%) aligned exactly 1 time,
0 (0.00%) aligned >1 times.
0.00% overall alignment rate.
3. I still used this output to do the counting by HTseq-count.
module load HTSeq/0.9.1-intel-2017A-Python-2.7.12
pe1_1='/scratch/user/guotingfeng/RNAseq/Hisat2/1_pe_aln.sam'
pe1_2='/scratch/user/guotingfeng/RNAseq/genomereference/Bacillus_subtilis_subsp_subtilis_str_168.ASM904v1.37.gtf'
sample='/scratch/user/guotingfeng/RNAseq/HTseq/Hisat2/bwamem_1sam.txt'
**htseq-count -f sam -s yes -m union -r name -t exon -i gene_id --additional-attr=gene_name $pe1_1 $pe1_2 > $sample**
The output is weird: I didn't get any read in each genes and it said all the reads are not aligned.
**gene:BSU38850 yxkC 0**
**gene:BSU38860 galE 0**
**gene:BSU38870 yxkA 0**
**gene:BSU38880 yxjO 0**
**gene:BSU38890 yxjN 0**
__no_feature 0
__ambiguous 0
__too_low_aQual 0
**__not_aligned 24291367**
__alignment_not_unique 0
So anyone have any ideas? Thank you very much. Have a good day.
The HTSeq part makes sense; that it's not reporting anything if 0% of reads aligned. Can you try to rerun HISAT2 with the most basic parameters? e.g. something like this to start:
hisat2 -p $threads -x $genome_index_prefix -q -1 $pe_1 -2 $pe_2 -S out.sam
If that works, I would try adding your other parameters one by one (
--dta
, then read groups, platform etc.) and see what causes the issue.Hi manuel.belmadani, Thank you for your reply.
I did as what you said to delete all other parameters and rerun the program. The result is totally same.
hisat2 -p $threads -x $genome_index_prefix -q -1 $pe_1 -2 $pe_2 -S out.sam
samtools view -h -bS out.sam | samtools sort -m 2G -@ $threads -T $sample -n -o $output_bam
The output of Sam is here:
@HD VN:1.0 SO:coordinate @SQ SN:0 LN:33 @PG ID:hisat2 PN:hisat2
VN:2.1.0 CL:"/sw/eb/software/HISAT2/2.1.0-intel-2017A-Python-2.7.12/bin/hisat2-align-s --wrapper basic-0 -p 20 -x /scratch/user/guotingfeng/RNAseq/genomereference/genomereference -q -S out.sam -1 /tmp/2611.inpipe1 -2 /tmp/2611.inpipe2"
K00333:90:H3TMTBBXY:2:1101:20070:1947 77 * 0 0 * * 0 0 CCTGTGTAATCTGCCCGCGGGCGGGTCATCCGGAGATTGTTAAACTGCTCAAGAATATTAAAGAAGCAGGCTGAC YT:Z:UP
K00333:90:H3TMTBBXY:2:1101:20070:1947 141 * 0 0 * * 0 0 NGCATTACTGAATAAATCGGTGATGTTAAAGGGACTCTTCACGTAGGAACTTATAAAAGGGTAAGTACAATTTTG YT:Z:UP
Report is still same,
25854795 reads; of these:
25854795 (100.00%) were paired; of these:
I found SN:0 in the Sam file, is that normal? Does it mean there is some problem for my genome reference data or index file? How can I check that? Thank you very much. Have a good night.