Entering edit mode
3.3 years ago
Priyanka
▴
10
Hello,
I am trying to align fastq files against the bacterial MRSA ATCC 33591 reference genome. The problem I am facing is that I have the reference assembly in fasta format with multiple sequences and upon creating index with hisat2-build I am not getting a good alignment rate. In fact most of the reads go unaligned.
I am curious to know if hisat2 can be used with genome assemblies directly or it needs to be converted to a single fasta file? Or am I doing something wrong.
I am new with bacterial genome and any advice on what to do or which tools to use will be helpful.
Thank you.
hisat2 do take multi-fasta file..
So there is no problem with using the genome assembly as a reference with hisat2 right?
Your genome assembly/reference genome should be within a single fasta file. You can easily join it together with cat. Does it work if you join it all?
My genome assembly is in one single fasta file with different headers for each contig and I am getting the result. However, the alignment rate is very low.
check if alignment (whatever the rate is) covers the whole genome. ?? view bam files using IGV or Tablet. also check if you are using the same reference
Here is the bam file for poor alignment rate. There are regions where reads are getting mapped in different contigs. However, the alignment rate is very low. I am not sure the reason for this. I will try using some different aligner as well to see if alignment rate varies.
Here is the bam file for poor alignment rate. There are regions where reads are getting mapped in different contigs. However, the alignment rate is very low. I am not sure the reason for this. I will try using some different aligner as well to see if alignment rate varies.
A few things I might look at: \ What happens if you blast one of your contigs in NCBI? \ And if you blast a few of your reads (the unmapped ones perhaps)? \ What are the regions where reads are aligning? Does it explain this very sparse alignment pattern? Can you align the same reads that were used to generate the assembly?
Thank you for the suggestions Samuel.
I did BLAST the reads and they were giving me hits for other bacteria, so I think the information about the reference given to me was perhaps not correct.
I am planning to map the reads to all the other bacterial genomes to see which genus my samples are mapping to the most.
take a few hundred random reads from your sample file and try blast on those.