Hi,
I've been given a reference genome and BAM file(s) which are reads aligned to that reference. This will be the first time I've done anything with BAM/SAM formats. I'd like to know how I would go about generating for each BAM file, the full sequence, using the reads in the BAM file and the reference they are aligned to, essentially resulting in a FASTA format alignment of the reference, and then each sequence represented by each BAM file. What are the considerations? For example at the top of my head, whether reads are paired end and their strandedness matters.
I'd like to know if this can be done in R / Bioconductor as I have some familiarity with it - mostly using Biostrings.
Thanks,
Ben.
I re-read your question and I think you actually want a fasta format alignment. You actually probably don't want that. You are new to BAM/SAM and so you might not realize yet that it would be pretty awful having it in a different alignment format. Take a look at the actual file or the first several lines and just see what the file means before moving onto something more familiar like fasta.
e.g.,
samtools view file.bam | head -n 999 | column -t | less -S