My understanding of SAM files and the format is fairly good, but there are some things I haven't quite grasped. I'm not sure how obvious you all may find these questions, but they're what have come to mind.
I'm interested in recovering the original sequenced read after some alignment has been done. I'd like to know which pieces of the read/read segments I need. How do I know what the full read sequence was that came off the sequencer? Can I reconstruct it by connecting the read segments in the same template together? If so, what is the template? It's not the read is it? The SAM format doesn't suggest that it is; it says the template is some DNA/RNA fragment.
Here are some questions:
- "What is the difference between the read I get from sequencing and the read segments I see in a SAM file?"
- Intuition tells me that read segments are mapped portions of the larger read, but are they arbitrarily segmented in the SAM presentation?
- Are segments contiguous? Can they also be non-contiguous?
- Can I reconstruct the full read from the multiple read segments?
- How does template correspond to a sequence read?
I'm very grateful for any clarification I can get on these questions.
You can look into picard
SamToFastq
That worked for me. Thanks!