Question

Mapping to mtDNA and then align the unmapped

0

Entering edit mode

16 months ago

DanielEB_fisk ▴ 20

Hello all,

I have aligned my samples against the mitochondrion genome of the species I work with. My idea was that after this I would keep the unmapped ones (which would be the nuclear reads), and then align these against the reference genome.

However, and I did not think of this, the result of said process are bam files, and I have no idea how or if I can use bam files for a re-alignment.

Here is the code I have used for the mapping against the mtDNA:

#!/bin/bash

REFERENCE=../refs/mtDNA/mitochondrion-genome.fa
REFNAME=mitochondrion-genome
FORWARD=_1.fq.gz
REVERSE=_2.fq.gz

for i in ../trim/1stBatch-fastp/*_1.fq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_1\.fq\.gz//" | rev | cut -d'/' -f-1 | rev)
  echo ${SAMPLE}_1.fq.gz ${SAMPLE}_2.fq.gz
  ## Define reference base name
  REFBASENAME="${REFERENCE%.*}"
  sbatch --account ACCOUNT --mem=64g -c 8 -o bowtie2-mtDNA.out -e bowtie2-mtDNA.err \
  --wrap="bowtie2 -q --phred33 --very-sensitive -p 8 -I 0 -X 1500 -x ${REFBASENAME} \
  -1 ../trim/1stBatch-fastp/${SAMPLE}${FORWARD} -2 ../trim/1stBatch-fastp/${SAMPLE}${REVERSE} -S ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam'
  samtools view -bS ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam' > ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.bam'
  rm -f ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam'
  samtools view -h -q 20 ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.bam' | samtools view -buS - | samtools sort -o ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'_minq20_sorted.bam'"
done

I was thinking that, as alternative, I could append the mtDNA to the nuclear genome and then use samtools to remove the reads aligned to the mtDNA. What do you think is a best strategy?

Thank you!

mtDNA Alignment Samtools Bowtie2 • 835 views

ADD COMMENT • link 16 months ago by DanielEB_fisk ▴ 20

score 3 · Answer 1 · 2023-07-14

3

Entering edit mode

16 months ago

ATpoint 85k

It would be better to align against the entire genome and then subset for what you need. For example, only mapping against mt will false-positively align nuclear homologs of mt genes with similar sequence mitochondrial genes to the mt genome.

As for the question, use samtools fastq to convert bam to fastq. If the bam is sorted then do samtools collate first since some aligners expect randomly-ordered fastq reads.