Question

HISAT2 no properly paired alignments

0

Entering edit mode

3.1 years ago

cengiz • 0

Hi All!

I'm a wetlab guy quite new to data analysis and would appreciate some help if possible!

Slowly i'm getting into commandline and understanding some of the workflow behind analysis but i've hit a bit of a wall. Following hisat2-build on the human genome (hg38) i run the following command and get the following output:

hisat2 -x genome/homo/homo.GRCm38 -U raw_data/NGS_211_553_fastq/_1_Cengiz.fastq.gz -S rna/lmsmc_ca_1.sam -p 3 -t

Time loading forward index: 00:00:09

Time loading reference: 00:00:05

Multiseed full-index search: 00:05:09

31656871 reads; of these:

  31656871 (100.00%) were unpaired; of these:

    375631 (1.19%) aligned 0 times

    29871475 (94.36%) aligned exactly 1 time

    1409765 (4.45%) aligned >1 times

98.81% overall alignment rate
Time searching: 00:05:17
Overall time: 00:05:26

Sounds great right? To me this reads like I'm doing ok!! Then i do samtools view sam > bam

then

samtools sort bam > sorted.bam

when i run samtools flagstat on this

samtools flagstat rna/lmsmc_ca_1.sorted.bam

34190647 + 0 in total (QC-passed reads + QC-failed reads)

2533776 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

33815016 + 0 mapped (98.90% : N/A)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (N/A : N/A)

0 + 0 with itself and mate mapped

0 + 0 singletons (N/A : N/A)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

So i'm a little confused as to why i have no properly paired sequences - i am a firm believer that i have done something wrong along the way! Any advices would be appreciated.

Many thanks, Cengiz

hisat2 • 1.1k views

ADD COMMENT • link 3.1 years ago by cengiz • 0

score 0 · Answer 1 · 2021-11-08

0

Entering edit mode

3.1 years ago

GenoMax 147k

Proper pairing of sequences is only applicable when you have paired-end reads. If a pair of sequences (that sample a library fragment) align within a certain expected distance of each other then they are considered properly paired. Since you have single end reads pairing does not apply.

ADD COMMENT • link 3.1 years ago by GenoMax 147k

0

Entering edit mode

Oooohhh thats what it meant by 'properly paired' i thought it was properly paired against my reference genome. So assuming there is nothing to worry about i will continue !!

Thanks!

ADD REPLY • link 3.1 years ago by cengiz • 0

0

Entering edit mode

In your case following lines are the ones that are important. While you have some secondary alignments your data is well aligned to the reference.

34190647 + 0 in total (QC-passed reads + QC-failed reads)

2533776 + 0 secondary

33815016 + 0 mapped (98.90% : N/A)