Hi,
I am trying to figure out the total length of all the exons covered in whole exome sequencing data which I downloaded from TCGA. In other words I am trying to figure out what is the total length of all the bp covered by the reads on the reference genome. Fastq file would probably not help since the number of reads would vary depending on the depth of sequencing. I tried to use the BAM files and samtools but could not find any tool in samtools that would give me total length of all the alignment coordinates. Is there a tool available to get this information or does someone know of a script that can help with this calculation. Basically I would need to extract the coordinates of all the unique alignments, calculate their length and add them up, unless there is some other subtlety that I am missing. Thanks for your help.
- Pankaj
I don't have a bed file of the exons but I have sent a request to CGHub help desk to ask for one. Assuming I can get one, could you please advise how I could use it to get the information I need. Thanks.
To size a bed file
I used the following command to output [chr pos depth]:
but the out file is huge, almost 48 GB, for a BAM file of 8 GB. This file can be processed but will take a while. Is there some way that bedtools will only output positions with depth > 0. That would reduce the output file size a lot. Thanks.