I have a control and tumour sam file which I will be using to detect copy number changes. I want to use reliable loci (defined as producing >100 reads for this purpose) so I want to exclude all loci that have <100 reads from the control sam file before I do my comparisons. The trouble is I dont know how to see the number of reads for each locus from a sam file. Can anyone direct me as to how to do this?
What tool on GATK do you use to filter the data with a given set of loci? It's not clear to me how I input the filter list with the DepthOfCoverage tool. It only provides documentation for a RefSeq Rod- is that the filtering list?
The idea is to create a BED file with GATK (you could also use
samtools depth
, though perhaps the GATK tool is more convenient (I don't know, I've never used that particular tool from GATK)). You can then filter the BAM files using samtools.OK so Im already using samtools to convert sam to bam- is there a way of specifying inclusion in a bam file only for those loci with a certain read depth in the sam file? GATK seems a bit too complicated if samtools works (using it anyway)
No you can't: 'samtools view filters' the alignments but it has no knowledge of a 'window' of depth.