Hello there,
I am trying to use the --sra-acc function from hisat2 with paired end data. I have installed both hisat2 and the sra-toolkit successfully. Indeed, the mapping works fine but the SAM output shows reads mapped as if they were single reads. My hisat2 command looks like:
hisat2 --no-mixed --no-discordant -x ../ref//hg38/genome --sra-acc <accession> -S output.sam
where <accession> is a single number, and this SRA accession links to both 1.fastq.gz and 2.fastq.gz
Is there any way to tell hisat2 that the accession refers to paired end reads?
Thanks!
Anna
Check if the SRA data really contain paired-end reads. I have seen single-end data marked as paired-end - in that case, it was a faulty upload. But bottom-line is don't trust SRA blindly.
Paired-end reads are aligned together. Many aligners drop the read designations from read name and may encode that information in SAM flags (83 and 147).
I really don't know about
hisat2 --sra-acc
option, but while downloading the data using NCBI-SRA Toolkit I had the same issue. fastq-dump utility of NCBI-SRA Toolkit gave me single fastq file for paired entry. This issue has been solved by--split-files
option provided byfastq-dump
.