We have a reference sequence with 40Ns located in the middle of it. After aligning the reads to the reference with the 40Ns, what is the best way to view the contigs that over lap the 40Ns region? Where can I get information on how many reads and contigs are covering that region using a bam file? I have sorted and indexed the bam file, but I'm not sure where to go from here. If anybody has done something like this, your help would be appreciated.
Thanks.
If you have access to samtools I would also suggest
samtools tview
as a very simple, fast viewer.Thanks for your responses.
Maybe I'm confused. I thought a bam file has a contigs and reads? I want to know which unique ones are covering the region of interest. I've used IGV and tview to see the alignment, but I'm interested in pipe-lining the data, so manually looking at it isn't exactly a solution we are looking for. I also have access to an sff, fasta, and fastq file.
The SAM format (of which BAM is just block gzip compressed and the blocks are indexed against genomic coordinates) is really just a container for reads, quality scores, alignment information, and other optional flags and strings. There are no contiguous constructs held in this format. The header of a SAM format file does contain a sequence library (@SQ) which defines the contains that you have aligned your reads to.
samtools view can whittle down your .bam file to just the reads that cover a particular region, or it can take a list of regions in bed format. Reads with high MQ should be unique, samtools view can also filter by MQ.