Question

High discordant alignment with HISAT2 Mus musculus

0

Entering edit mode

24 months ago

AHerik ▴ 20

I ran HISAT2 on my reads post adaptor removal, and received very high discordant alignments (>80%) with low concordant alignments.

81987410 reads; of these: 81987410 (100.00%) were paired; of these: 81584433 (99.51%) aligned concordantly 0 times 92014 (0.11%) aligned concordantly exactly 1

time 310963 (0.38%) aligned concordantly >1 times
----
81584433 pairs aligned concordantly 0 times; of these:
  64183926 (78.67%) aligned discordantly 1 time
----
17400507 pairs aligned 0 times concordantly or discordantly; of these:
  34801014 mates make up the pairs; of these:
    3133661 (9.00%) aligned 0 times
    9518238 (27.35%) aligned exactly 1 time
    22149115 (63.65%) aligned >1 times
98.09% overall alignment rate

My reads are very short (about 50 bp), and initially I had 2 lanes so I concatenated the two files. Following concatenation, I used how_are_we_stranded_here to infer RF strandedness. I then ran HISAT2 using the following command, based on an index I generated from the mouse genome (v38):

for sample in $(ls adaptor_trimmed/*.fastq.gz | rev | cut -d "/" -f 1 | cut -d "_" -f 2 | rev | sort | uniq) do
    hisat2 -p 16 --rg-id=${sample} --rg PL:ILLUMINA -x /RNA_references/Mus_musculus.GRCm38.dna.primary_assembly_index --dta
--rna-strandness RF \
    -1 /adaptor_trimmed/${sample}_R1.fastq.gz \
    -2 /adaptor_trimmed/${sample}_R2.fastq.gz \
    -S ./${sample}.sam done

I have no idea why I am receiving such high discordant alignments. Could it be that my reads are not RF? Should I try reversing the order of the reads for R1 and R2? I would appreciate any insight or trouble shooting advice.

Thank you!

HISAT2 RNA-seq • 1.6k views

ADD COMMENT • link updated 24 months ago by GenoMax 152k • written 24 months ago by AHerik ▴ 20

1

Entering edit mode

The simplest thing is to try FR (use on a small subset of reads, maybe the first 800.000 lines of each fastq) to test whether this solves it. The overall alignment rate means data are not crap/problematic, but just some option skews up the definition of concordance with strandedness expectation. Or run a tool to detect strandedness. If you search for "hisat2 stranded" you find many suggestions in older threads.

ADD REPLY • link 24 months ago by ATpoint 88k

0

Entering edit mode

-1 /adaptor_trimmed/${sample}_R1.fastq.gz

Folder name seems to indicate the data is trimmed. If the trimming was done independently for the paired-end data then this would explain discordant alignments.

Please scan/trim paired-end data together. That should fix this problem.

ADD REPLY • link 24 months ago by GenoMax 152k

0

Entering edit mode

Hi ATpoint and GenoMax , thank you for your replies! I ran the paired-end data together for the trimming, I have attached my code:

for sample in $(ls /merged_files/*.fastq.gz | rev | cut -d "/" -f 1 | cut -d "_" -f 2 | rev | sort | uniq)
do
    fastp -i /merged_files/${sample}_R1.fastq.gz -I /merged_files/${sample}_R2.fastq.gz \
    -o /adaptor_trimmed/${sample}_R1.fastq.gz \
    -O /adaptor_trimmed/${sample}_R2.fastq.gz \
    -l 25 --adapter_fasta /illumina_multiplex.fa --json /adaptor_trimmed/${sample}.fastp.json \
    --html /adaptor_trimmed/${sample}.fastp.html \
    2>/adaptor_trimmed/${sample}.fastp.log
done

I also tried to use FR as you suggested, but this ended up with reads with the same % of discordancy. Could this have to do with the way I wrote the loop? I'm kind of at a loss, but I would appreciate any guidance on trouble shooting.

Thank you

81987410 reads; of these:
  81987410 (100.00%) were paired; of these:
    81584433 (99.51%) aligned concordantly 0 times
    92014 (0.11%) aligned concordantly exactly 1 time
    310963 (0.38%) aligned concordantly >1 times
    ----
    81584433 pairs aligned concordantly 0 times; of these:
      64183926 (78.67%) aligned discordantly 1 time
    ----
    17400507 pairs aligned 0 times concordantly or discordantly; of these:
      34801014 mates make up the pairs; of these:
        3133661 (9.00%) aligned 0 times
        9518238 (27.35%) aligned exactly 1 time
        22149115 (63.65%) aligned >1 times
98.09% overall alignment rate

ADD REPLY • link 24 months ago by AHerik ▴ 20

score 1 · Accepted Answer · 2023-08-02

Hi ATpoint ! Solved the issue, it had to do with the way I concatenated my files. My script for concatenation or the concatenation itself did something to the alignment. My data is indeed RF. Here is a sample alignment, does this look good?

41117254 reads; of these:
  41117254 (100.00%) were paired; of these:
    1533543 (3.73%) aligned concordantly 0 times
    35711647 (86.85%) aligned concordantly exactly 1 time
    3872064 (9.42%) aligned concordantly >1 times
    ----
    1533543 pairs aligned concordantly 0 times; of these:
      83249 (5.43%) aligned discordantly 1 time
    ----
    1450294 pairs aligned 0 times concordantly or discordantly; of these:
      2900588 mates make up the pairs; of these:
        1683799 (58.05%) aligned 0 times
        931448 (32.11%) aligned exactly 1 time
        285341 (9.84%) aligned >1 times
97.95% overall alignment rate