I am trying to map some raw sequence reads to a reference genome and a reference mitochondrial genome. The reads are paired-end and have already been trimmed and filtered so that the two paired files contain only matched reads (ie mate pairs). We are using the same fastq files as input and the exact same commands for mapping, but the frustrating part is that the sam file with the whole genome reads looks just fine, while the sam file with the mitochondrial genome reads isn't storing any information about the reference file or the position of the reads. Both have been indexed using the "bwa index" command (using no flags). We have tried downloading different mitochondrial genome references but the outcome is the same. Any insight would be appreciated!
Here are the commands for the mitochondrial and whole genome mapping:
bwa mem bos_mito.fasta B10_trimmed.pair1.truncated B10_trimmed.pair2.truncated > B10_mito_bwa.sam
bwa mem bos_genome.fa B10_trimmed.pair1.truncated B10_trimmed.pair2.truncated > B10_genome_bwa.sam
This is an example line for the sam file using the whole genome reference:
A00835:368:HVTNKDSX2:1:2573:13114:26678 99 chr1 182 40 55S19M59S = 182 19 TCCACTTCGTCGCGCTGCGCGCGCGACGGGCGAGCCGCGCGCATGACCACGCGCACGACCCCCACGCTGCCCACGCGCACTCGGCCCACGCGCAGGAAGCTCCTGCGCACGCCGGAGACGGCCCGGGCGCCCT FFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFF NM:i:0 MD:Z:19 MC:Z:55S19M59S AS:i:19 XS:i:0
And the mito reference:
A00835:368:HVTNKDSX2:1:1101:1036:1000 77 * 0 0 * * 0 0 GGATTGAGCGCCGCCAGTCGCGCCGGGACATCCANNNNNNNCCTGAAACTTTCTCGATGGTTTCGCCGCCGCGGAGGTAGGCGACGGCGTACAACGCCTCGCCAACTGCGAAGAGGCCGACCGCGAGCACCACGACGTCGATGCCGTCGAG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF#######FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF AS:i:0 XS:i:0
Any insights as to what we can do to make the mitochondrial mapping work properly would be appreciated!
This means that your reads are not aligning to mitochondrial reference. Do you expect them to align for sure?
Not really an answer, but if you have WGS reads you should align to all chromosomes including the mito at once.
Also, use samtools stats OR flatstag OR idxstats, then multiqc to assess if your reads have really been aligned to the mitochondrion.
When just testing and getting used to the tools, you can just align single ended data to see if you get more alignments (PE more specific of course, but just for bug checking).
Maybe also get a bigger (?) public dataset for your organism and test vs your genome assembly as well (you can get data with sra-explorer https://sra-explorer.info/ very easily.
Lastly, with bwa mem, map using the -t option for a big speedup, eg
are you saying that none of the reads match? run and post the output of
samtools flagstat
orsamtools idxstats
for us to actually see the numbers