For testing purposes, I'm interested into work with a small limited number of reads that align well to a given genomic region, also small, so computation time and memory requirements are as low as possible.
This could be done (I guess) if from the alignment BAM file one could associate the aligned reads with specific reads in the original FASTQ file.
I don't know if this is even possible. So far, I've found nothing.
Yes, reads can be extracted using e.g.
samtools fastq
. It is recommended to randomize the BAM before doing that as many alignment tools expect random fastq order (for paired-end data). Could be don withsamtools collate
orsamtools sort -n
followed bysamtools fastq
. For other threads on this please use the search function, there are many.Two questions:
What will
samtools fastq
do with hard-clipped reads/secondary alignments? Output shortened reads? I assume some kind of filtering should be applied before.Do you have any reference for the random fastq order requirement? I have not heard of this before. That would be good to know for certain applications...
Edit: formatting
Yes. You can filter your BAM file to separate aligned reads and then convert them back to fastq.
You can first filter your BAM with
samtools view region
to get the region you need.Then use
reformat.sh
from BBMap suite to retrieve reads:If you have single end data then just use
out=read.fq.gz
instead.So you would like to convert bam to fastq? Have you googled for that?
Not with that words, my mistake (I'm not native speaker). Now I 've found a way with
bedtools
, thank youDon't worry. Most here aren't. Neither WouterDeCoster nor me are.
where you able to retrieve your original reads file frombam files? Can you share the processs and commands? I really need help on it I mixed up my data and need to retrieve from bam files the raw reads.
There are multiple suggestions on how to do this in this thread. Suggest you pick one and try it.