Question

How to align fastq files against a reference assembly

0

Entering edit mode

3.3 years ago

Priyanka ▴ 10

Hello,

I am trying to align fastq files against the bacterial MRSA ATCC 33591 reference genome. The problem I am facing is that I have the reference assembly in fasta format with multiple sequences and upon creating index with hisat2-build I am not getting a good alignment rate. In fact most of the reads go unaligned.

I am curious to know if hisat2 can be used with genome assemblies directly or it needs to be converted to a single fasta file? Or am I doing something wrong.

I am new with bacterial genome and any advice on what to do or which tools to use will be helpful.

Thank you.

assembly alignment bacterial hisat2 • 4.2k views

ADD COMMENT • link updated 3.3 years ago by hafiz.talhamalik ▴ 350 • written 3.3 years ago by Priyanka ▴ 10

0

Entering edit mode

hisat2 do take multi-fasta file..

ADD REPLY • link 3.3 years ago by hafiz.talhamalik ▴ 350

0

Entering edit mode

So there is no problem with using the genome assembly as a reference with hisat2 right?

ADD REPLY • link 3.3 years ago by Priyanka ▴ 10

0

Entering edit mode

Your genome assembly/reference genome should be within a single fasta file. You can easily join it together with cat. Does it work if you join it all?

ADD REPLY • link 3.3 years ago by samuel.a.odonnell ▴ 580

0

Entering edit mode

My genome assembly is in one single fasta file with different headers for each contig and I am getting the result. However, the alignment rate is very low.

9557190 reads; of these:
  9557190 (100.00%) were paired; of these:
    9372275 (98.07%) aligned concordantly 0 times
    59908 (0.63%) aligned concordantly exactly 1 time
    125007 (1.31%) aligned concordantly >1 times
    ----
    9372275 pairs aligned concordantly 0 times; of these:
      19 (0.00%) aligned discordantly 1 time
    ----
    9372256 pairs aligned 0 times concordantly or discordantly; of these:
      18744512 mates make up the pairs; of these:
        18662286 (99.56%) aligned 0 times
        33977 (0.18%) aligned exactly 1 time
        48249 (0.26%) aligned >1 times
2.37% overall alignment rate

ADD REPLY • link 3.3 years ago by Priyanka ▴ 10

0

Entering edit mode

check if alignment (whatever the rate is) covers the whole genome. ?? view bam files using IGV or Tablet. also check if you are using the same reference

ADD REPLY • link 3.3 years ago by hafiz.talhamalik ▴ 350

0

Entering edit mode

enter image description here

Here is the bam file for poor alignment rate. There are regions where reads are getting mapped in different contigs. However, the alignment rate is very low. I am not sure the reason for this. I will try using some different aligner as well to see if alignment rate varies.

ADD REPLY • link 3.3 years ago by Priyanka ▴ 10

0

Entering edit mode

enter image description here

Here is the bam file for poor alignment rate. There are regions where reads are getting mapped in different contigs. However, the alignment rate is very low. I am not sure the reason for this. I will try using some different aligner as well to see if alignment rate varies.

ADD REPLY • link 3.3 years ago by Priyanka ▴ 10

0

Entering edit mode

A few things I might look at: \ What happens if you blast one of your contigs in NCBI? \ And if you blast a few of your reads (the unmapped ones perhaps)? \ What are the regions where reads are aligning? Does it explain this very sparse alignment pattern? Can you align the same reads that were used to generate the assembly?

ADD REPLY • link 3.3 years ago by samuel.a.odonnell ▴ 580

0

Entering edit mode

Thank you for the suggestions Samuel.

I did BLAST the reads and they were giving me hits for other bacteria, so I think the information about the reference given to me was perhaps not correct.

I am planning to map the reads to all the other bacterial genomes to see which genus my samples are mapping to the most.