High discordant alignment with HISAT2 Mus musculus
1
0
Entering edit mode
16 months ago
AHerik ▴ 20

I ran HISAT2 on my reads post adaptor removal, and received very high discordant alignments (>80%) with low concordant alignments.

81987410 reads; of these: 81987410 (100.00%) were paired; of these: 81584433 (99.51%) aligned concordantly 0 times 92014 (0.11%) aligned concordantly exactly 1

time 310963 (0.38%) aligned concordantly >1 times

----
81584433 pairs aligned concordantly 0 times; of these:
  64183926 (78.67%) aligned discordantly 1 time
----
17400507 pairs aligned 0 times concordantly or discordantly; of these:
  34801014 mates make up the pairs; of these:
    3133661 (9.00%) aligned 0 times
    9518238 (27.35%) aligned exactly 1 time
    22149115 (63.65%) aligned >1 times

98.09% overall alignment rate

My reads are very short (about 50 bp), and initially I had 2 lanes so I concatenated the two files. Following concatenation, I used how_are_we_stranded_here to infer RF strandedness. I then ran HISAT2 using the following command, based on an index I generated from the mouse genome (v38):

for sample in $(ls adaptor_trimmed/*.fastq.gz | rev | cut -d "/" -f 1 | cut -d "_" -f 2 | rev | sort | uniq) do
    hisat2 -p 16 --rg-id=${sample} --rg PL:ILLUMINA -x /RNA_references/Mus_musculus.GRCm38.dna.primary_assembly_index --dta
--rna-strandness RF \
    -1 /adaptor_trimmed/${sample}_R1.fastq.gz \
    -2 /adaptor_trimmed/${sample}_R2.fastq.gz \
    -S ./${sample}.sam done

I have no idea why I am receiving such high discordant alignments. Could it be that my reads are not RF? Should I try reversing the order of the reads for R1 and R2? I would appreciate any insight or trouble shooting advice.

Thank you!

HISAT2 RNA-seq • 1.0k views
ADD COMMENT
1
Entering edit mode

The simplest thing is to try FR (use on a small subset of reads, maybe the first 800.000 lines of each fastq) to test whether this solves it. The overall alignment rate means data are not crap/problematic, but just some option skews up the definition of concordance with strandedness expectation. Or run a tool to detect strandedness. If you search for "hisat2 stranded" you find many suggestions in older threads.

ADD REPLY
0
Entering edit mode

-1 /adaptor_trimmed/${sample}_R1.fastq.gz

Folder name seems to indicate the data is trimmed. If the trimming was done independently for the paired-end data then this would explain discordant alignments.

Please scan/trim paired-end data together. That should fix this problem.

ADD REPLY
0
Entering edit mode

Hi ATpoint and GenoMax , thank you for your replies! I ran the paired-end data together for the trimming, I have attached my code:

for sample in $(ls /merged_files/*.fastq.gz | rev | cut -d "/" -f 1 | cut -d "_" -f 2 | rev | sort | uniq)
do
    fastp -i /merged_files/${sample}_R1.fastq.gz -I /merged_files/${sample}_R2.fastq.gz \
    -o /adaptor_trimmed/${sample}_R1.fastq.gz \
    -O /adaptor_trimmed/${sample}_R2.fastq.gz \
    -l 25 --adapter_fasta /illumina_multiplex.fa --json /adaptor_trimmed/${sample}.fastp.json \
    --html /adaptor_trimmed/${sample}.fastp.html \
    2>/adaptor_trimmed/${sample}.fastp.log
done

I also tried to use FR as you suggested, but this ended up with reads with the same % of discordancy. Could this have to do with the way I wrote the loop? I'm kind of at a loss, but I would appreciate any guidance on trouble shooting.

Thank you

81987410 reads; of these:
  81987410 (100.00%) were paired; of these:
    81584433 (99.51%) aligned concordantly 0 times
    92014 (0.11%) aligned concordantly exactly 1 time
    310963 (0.38%) aligned concordantly >1 times
    ----
    81584433 pairs aligned concordantly 0 times; of these:
      64183926 (78.67%) aligned discordantly 1 time
    ----
    17400507 pairs aligned 0 times concordantly or discordantly; of these:
      34801014 mates make up the pairs; of these:
        3133661 (9.00%) aligned 0 times
        9518238 (27.35%) aligned exactly 1 time
        22149115 (63.65%) aligned >1 times
98.09% overall alignment rate
ADD REPLY
1
Entering edit mode
15 months ago
AHerik ▴ 20

Hi ATpoint ! Solved the issue, it had to do with the way I concatenated my files. My script for concatenation or the concatenation itself did something to the alignment. My data is indeed RF. Here is a sample alignment, does this look good?

41117254 reads; of these:
  41117254 (100.00%) were paired; of these:
    1533543 (3.73%) aligned concordantly 0 times
    35711647 (86.85%) aligned concordantly exactly 1 time
    3872064 (9.42%) aligned concordantly >1 times
    ----
    1533543 pairs aligned concordantly 0 times; of these:
      83249 (5.43%) aligned discordantly 1 time
    ----
    1450294 pairs aligned 0 times concordantly or discordantly; of these:
      2900588 mates make up the pairs; of these:
        1683799 (58.05%) aligned 0 times
        931448 (32.11%) aligned exactly 1 time
        285341 (9.84%) aligned >1 times
97.95% overall alignment rate
ADD COMMENT
1
Entering edit mode

My script for concatenation or the concatenation itself did something to the alignment.

If you were concatenating multiple files (e.g. lane specific files for a sample) then the individual file order has to be maintained for R1/R2 files. Sounds like there was an issue with that order, which caused the reads to go out of sync in final files, which led to the discordant alignments. You forgot to mention this important bit.

I am going to move your comment to an answer to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6