Dear community,
I am trying to convert BAM to FASTQ because the archived dataset is aligned to the older version of the reference genome. I need to realign to the new version. SRA is not available (I know, devastating experience).
I tried two different approaches and both of them show errors: Approach 1. Merge all BAM files into one (each BAM file represents alignment per chromosome). The error shows that some BAM files already exist and asks to forcefully overwrite.
Approach 2. Convert each BAM file separately and merge them together as FASTQ. By doing the second approach, I tried to skip the error in approach 1.
When I convert BAM to FASTQ in both approaches, I get millions of these warnings:
The error is
WARNING: Query [...] is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
Here is my code:
#Appoach 1:
samtools merge -n -b all_sorted_merged.bam *.bam
bedtools bamtofastq -i all_sorted_merged.bam -fq $1_1.fq -fq2 $1_2.fq
gzip -c $1_*.fq > $1_\*.fastq.gz
#Approach 2:
samtools sort -n -o $1.sorted.bam $1.bam
bedtools bamtofastq -i $1.sorted.bam -fq $1_1.fq -fq2 $1_2.fq
gzip -c $1_\*.fq > $1_\*.fastq.gz
I am learning bioinformatics myself so I would be glad for any of your comments. Thank you!
That is strange. Unless you messed up some of the BAM files you should not get any errors with a simple merge.
Why are you not using
samtools fastq
to do the conversion to fastq? Take a look at in line help forsamtools fastq
.I found why is the warning in bedtools. The error shows up because the BAM files include Single Reads and Paired-End reads. So the warning is for single reads.
Is there any way to use extract-only Paired-End reads? I need only these data for further analysis.
Thank you, I will look for the tool! I just did not know about it!