I thought I was finally in the stretch but lo and behold a new issue arises. Here is my whole pipeline that I used:
bwa index refgenome.fa
bwa mem -B 2 -M refgenome.fa cert1.R1.fastq cert1.R2.fastq > cert1_aln.sam
samtools view -Sb cert1_aln.sam > cert1_aln.bam
# I do a sort here but maybe I don't need to? I have just seen many pipelines that sort and index at this point so I did it too.
samtools sort cert1_aln.bam > cert1_aln_sorted.bam | rm -f cert1_aln.bam
samtools index cert1_aln_sorted.bam
# Gather some stats
samtools flagstat cert1_aln_sorted.bam > cert1_stats.txt
# Extract unmapped read whose mate is mapped
samtools view -b -f 4 -F 264 cert1_aln_sorted.bam > cert1_tmp1_unmapped.bam
# Extract mapped read whose mate is unmapped
samtools view -b -f 8 -F 260 cert1_aln_sorted.bam > cert1_tmp2_unmapped.bam
# Extract unmapped read with unmapped mate
samtools view -b -f 12 -F 256 cert1_aln_sorted.bam > cert1_tmp3_unmapped.bam
# Merge the three tmp files into 1
samtools merge cert1_unmapped.bam cert1_tmp*_unmapped.bam
# Extract mapped reads from BAM file
samtools view -b -F 12 cert1_aln_sorted.bam > cert1_mapped.bam
# Sort the BAM files by name
samtools sort -n cert1_unmapped.bam > cert1_unmapped_sorted.bam
samtools sort -n cert1_mapped.bam > cert1_mapped_sorted.bam
# Finally, convert to Fastq
bamToFastq -i cert1_unmapped_sorted.bam -fq cert1_unmapped.R1.fastq -fq2 cert1_unmapped.R2.fastq
bamToFastq -i cert1_mapped_sorted.bam -fq cert1_mapped.R1.fastq -fq2 cert1_mapped.R2.fastq
The error I get towards the end is:
seqname is marked as paired, but its mate does not occur next to it in your BAM file. Skipping
It spams my terminal and it's endless. My fastq files are tiny and clearly missing sequences. Most biostars posts on this say the issue is with sorting without -n flag, but I tried sorting with it and without it and I get the same error regardless. I just want one fastq file with paired mapped reads, and then another fastq file with all unmapped reads/mates.
what is the output of
file *.bam
?While you are checking it also show us output of
samtools view -H cert1.bam | head -8
.The tmp#_unmapped.bam files give me "gzip compressed data, extra field". For cert1_unmapped.bam I just get "empty". Seems like the the merge step is what is giving me issues actually.