Entering edit mode
9.3 years ago
bioguy24
▴
230
You can use samtools bam2fq
to produce fastq out of a bam file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
there is no such thing as an "unaligned" bam file - a bam file is an alignment file. The command above creates Fastq files that you can align with another tool and create a different alignment file.
Ok, I see what you mean. The confusion arises from the having reads with no alignments in an alignment file and once we all put these all into a different file one can call an alignment file as being unaligned ...
The file is still an alignment file, it just contains reads that did not align. Use the same command as above to extract the read sequences. You now have a new dataset that you can align with a different tool. Treat it as new data.
I think I understand, but I want to make sure.
Running the command above on the aligned bam file will give me both aligned and unaligned reads in a new file? Or should I request the unaligned bam as well to use with other tools? Thank you :).
I believe that you need to run the command on each file, one will give you part of the original data that was aligned successfully, the other will give you part of the original data that did not align with that method. You could combine the two or keep them separate.
@Istvan Albert, Ion machine provides data in different formats. One such format is Unaligned BAM. I was working with the Ion data, I've been given the unaligned bam and I converted these BAM files back to fastq with different tools samtools, bedtools, picard ..etc. But the problem is, these fastq files don't match with the fastq produced by the machine. Still I wonder how to convert these unaligned BAM to fastq.
Yes, in the meantime I also understood the purpose of an unaligned bam file - with them it is possible to attach read group hence sample related information to the data.
FWIW that only demonstrates just how ill defined our file formats are - we are using an alignment format to represent unaligned data because the regular format does not allow us to enter sample related information.
But what if the aligner doesn't accept this kind of format as input? How to convert it to standard fastq format? Asking the sequencing service provider for fastq files is the only way?
of course, most aligners won't accept this format. This is a somewhat unexpected (even absurd) storage format where we store raw data as "alignment" because the alignment format allows attaching sample information to the data
as mentioned in the answer above you would need to convert to fastq before aligning with
samtools bam2fq
As mentioned previously, fastq produced with these tools (samtools, bedtools..etc) are not matching with the fastq produced from the ion machine. This is the major problem. Initially I've received unaligned BAM for a sample and I converted these to fastq, for the same sample I've received fastq from the machine afterwards, the fastq I've generated from unaligned BAM is 720M and fastq machine has given me is 23GB.
As for as I know (and according to what you have mentioned above) they have only given me unaligned reads in BAM format after the alignment with the TMAP, supporting to this, read id's from unaligned BAM are not found in the TMAP alignment file. This would be a major problem for entire experiment if one proceeds that unaligned BAM is the whole raw data.