I am aligning NextSeq500 2x150 PE reads to a the reference mitochrondrial genome. These reads were generated from 550bp library from a high quality sample. The problem that I am getting is that very few reads are aligning about 1000 of 4000000 reads which is much lower than expected. Does anyone know that the issue may be?
Yes, I've ran the data through fastqc, most of the parameters were good. I had red flags for the per base and GC content which were a bit unbalanced but not totally obscure, and for the k-mer content. I used fastq-mcf in ea-utils which removed a few reads and reduced the k-mers.
In blast I get a bunch of different results if I run against everything, but if I blast against the reference assembly I get hits with low e values and high identity and query cov.
I've also tried running the alignment against the whole reference genome rather than mt genome and get more mapped reads than there are input reads (presumably due to multiple mapping), which is unusual as we would expect most reads to be mitochondrial. I think bwa mem should be fine for mt genome alignment but does anyone know if there are extra parameters that should be set?
Have you run the raw data through fastqc?
Have you tried blasting a few?