Entering edit mode
8.6 years ago
deepak643
•
0
I am looking to get the offset ranges of the each chromosome from the sorted SAM/BAM file. Is it possible?
I am looking to get the offset ranges of the each chromosome from the sorted SAM/BAM file. Is it possible?
From your comment even knowing from which line chr2/chr3/.. starts will also help
I would do something like following
chr.txt
Using chr.txt
, I would do
cat chr.txt | xargs -I {} grep -n {} foo.sam | sed 's/:/ /' | sort -k2 -u > chr_line_number.txt
head chr_line_number.txt
1 chr1
3 chr2
5 chr3
...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
For indexed BAM files, this is available in the .bai index file. If you take the virtual file offset of the first bin of each chromosome, you should be able to
>>16
that and get the file offset to the bin.Edit: For SAM files, I suppose you could try tabix, but I'm not sure why one would want to do that.
is there any api or samtools command to get the virtual offset from bai file?
I don't think it's meant to be used (so it's not really documented), but presumably the
hts_idx_t
object from htslib would hold that.what is "offset ranges " ? the file index (fseek ?) ?
Yes the fseek to location in file. So in a sorted bam, the chromosome records will be from chr1 to chr25 sequentially. So I mean the offset ranges as chr1's records are from line number 1 to line 10. chr2 are from line 11 to line 20 and so on. So is there any way that I can get those offset ranges? Or else even knowing from which line chr2/chr3/.. starts will also help.
what devon said. You just need a BAI index to get some specific reads at a given location.