hisat2 --sra-acc with paired reads producing single read output
1
1
Entering edit mode
6.5 years ago
avp25 ▴ 10

Hello there,

I am trying to use the --sra-acc function from hisat2 with paired end data. I have installed both hisat2 and the sra-toolkit successfully. Indeed, the mapping works fine but the SAM output shows reads mapped as if they were single reads. My hisat2 command looks like:

hisat2  --no-mixed --no-discordant -x ../ref//hg38/genome --sra-acc <accession> -S output.sam

where <accession> is a single number, and this SRA accession links to both 1.fastq.gz and 2.fastq.gz

Is there any way to tell hisat2 that the accession refers to paired end reads?

Thanks!

Anna

hisat2 sra-tools • 3.5k views
ADD COMMENT
2
Entering edit mode

Check if the SRA data really contain paired-end reads. I have seen single-end data marked as paired-end - in that case, it was a faulty upload. But bottom-line is don't trust SRA blindly.

ADD REPLY
0
Entering edit mode

Paired-end reads are aligned together. Many aligners drop the read designations from read name and may encode that information in SAM flags (83 and 147).

ADD REPLY
0
Entering edit mode

I really don't know about hisat2 --sra-acc option, but while downloading the data using NCBI-SRA Toolkit I had the same issue. fastq-dump utility of NCBI-SRA Toolkit gave me single fastq file for paired entry. This issue has been solved by --split-files option provided by fastq-dump.

ADD REPLY
0
Entering edit mode
5.4 years ago
poojasethiya ▴ 120

For single end SRA data hisat2 gives summary statistics as:

HISAT2 summary stats:
Total reads: 27870948
                 Aligned 0 time: 11196555 (40.17%)
                 Aligned 1 time: 15284312 (54.84%)
                 Aligned >1 times: 1390081 (4.99%)
         Overall alignment rate: 59.83%

For paired end SRA data hisat2 gives summary statistics as:

HISAT2 summary stats:
        Total pairs: 9113937
                Aligned concordantly or discordantly 0 time: 4233427 (46.45%)
                Aligned concordantly 1 time: 4493249 (49.30%)
                Aligned concordantly >1 times: 351445 (3.86%)
                Aligned discordantly 1 time: 35816 (0.39%)
        Total unpaired reads: 8466854
                Aligned 0 time: 6882022 (81.28%)
                Aligned 1 time: 1428012 (16.87%)
                Aligned >1 times: 156820 (1.85%)
        Overall alignment rate: 62.24%

From these results it can be seen that hisat2 treats paired-end and single end data differently.

ADD COMMENT

Login before adding your answer.

Traffic: 2285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6