Error when trying to convert BAM file to FASTQ.
1
0
Entering edit mode
3.9 years ago
DNAngel ▴ 250

I thought I was finally in the stretch but lo and behold a new issue arises. Here is my whole pipeline that I used:

bwa index refgenome.fa
bwa mem -B 2 -M refgenome.fa cert1.R1.fastq cert1.R2.fastq > cert1_aln.sam
samtools view -Sb cert1_aln.sam > cert1_aln.bam

# I do a sort here but maybe I don't need to? I have just seen many pipelines that sort and index at this point so I did it too.
samtools sort cert1_aln.bam > cert1_aln_sorted.bam | rm -f cert1_aln.bam
samtools index cert1_aln_sorted.bam

# Gather some stats
samtools flagstat cert1_aln_sorted.bam > cert1_stats.txt

# Extract unmapped read whose mate is mapped
samtools view -b -f 4 -F 264 cert1_aln_sorted.bam > cert1_tmp1_unmapped.bam
# Extract mapped read whose mate is unmapped
samtools view -b -f 8 -F 260 cert1_aln_sorted.bam > cert1_tmp2_unmapped.bam
# Extract unmapped read with unmapped mate
samtools view -b -f 12 -F 256 cert1_aln_sorted.bam > cert1_tmp3_unmapped.bam

# Merge the three tmp files into 1
samtools merge cert1_unmapped.bam cert1_tmp*_unmapped.bam

# Extract mapped reads from BAM file
samtools view -b -F 12 cert1_aln_sorted.bam > cert1_mapped.bam

# Sort the BAM files by name
samtools sort -n cert1_unmapped.bam > cert1_unmapped_sorted.bam
samtools sort -n cert1_mapped.bam > cert1_mapped_sorted.bam

# Finally, convert to Fastq
bamToFastq -i cert1_unmapped_sorted.bam -fq cert1_unmapped.R1.fastq -fq2 cert1_unmapped.R2.fastq
bamToFastq -i cert1_mapped_sorted.bam -fq cert1_mapped.R1.fastq -fq2 cert1_mapped.R2.fastq

The error I get towards the end is:

seqname is marked as paired, but its mate does not occur next to it in your BAM file. Skipping

It spams my terminal and it's endless. My fastq files are tiny and clearly missing sequences. Most biostars posts on this say the issue is with sorting without -n flag, but I tried sorting with it and without it and I get the same error regardless. I just want one fastq file with paired mapped reads, and then another fastq file with all unmapped reads/mates.

samtools bedtools • 1.7k views
ADD COMMENT
0
Entering edit mode

what is the output of file *.bam ?

ADD REPLY
0
Entering edit mode

While you are checking it also show us output of samtools view -H cert1.bam | head -8.

ADD REPLY
0
Entering edit mode

The tmp#_unmapped.bam files give me "gzip compressed data, extra field". For cert1_unmapped.bam I just get "empty". Seems like the the merge step is what is giving me issues actually.

ADD REPLY
0
Entering edit mode
3.9 years ago

I don't think your merge command is right. You've only got three files, something isn't working so just do it the easy way, and the way the software expects...with the name of the output file listed first, then the three input files. If that works, you can see if cert1_tmp*_unmapped.bam will work

samtools merge cert1_unmapped.bam cert1_tmp1_unmapped.bam cert1_tmp2_unmapped.bam cert1_tmp3_unmapped.bam
ADD COMMENT
0
Entering edit mode

Eh that seemed to have worked! And it also worked using tmp* instead of having to list all the files out.

Thank you so much! I guess samtools merge has updated since the tutorial I am using? Tricky to catch these things.

ADD REPLY
0
Entering edit mode

It's always wise to just run the command once with --help so you can understand the command line you are putting together.

ADD REPLY
0
Entering edit mode

I have - but sometimes the help language is not very easy to understand for someone not experienced with bioinformatics. I have run into a new problem. I constantly get the warning "seq## is marked as paired, but its mate does not occur next to it in your BAM file. Skipping"

I am at the last couple of steps - it seems to be an issue with sorting. I've tried samtools sort with the -n flag and without it on cert_unmapped.bam. Again just reading --help doesn't really provide any clarity on this. Most tutorials/code I've seen on biostars use -n flag but it's not working. Might have to make this a separate post so I can post my whole code again.

ADD REPLY
0
Entering edit mode

Well, you sliced and diced your input bam, so it's no surprise that you've got loose ends. You are going to have to investigate those reads for yourself.

ADD REPLY
0
Entering edit mode

But extracting unmapped reads into separate files was how most tutorials online show it to work (like no novocraft). But none of them seem to report an issue once using bamToFastq after merging them

ADD REPLY
0
Entering edit mode

One thread, one question.

ADD REPLY
0
Entering edit mode

My new question got removed because they said it was too similar to this. So I just reposted it here.

ADD REPLY

Login before adding your answer.

Traffic: 1228 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6