Hi,
I have data from RNAseq experiment with a coverage of 100 million reads. Samtools mpileup has an option of --max-depth which means 'At a position, read maximally INT reads per input file'. Its default value is 250 and can be increased upto 1000000 (samtools v1.2). There is a limit/cap/default in this to limit memory leak. How do we decide this value for a variant calling experiment?
I am aware that a combindation of 8 identical reads that cover the location of a variant will produce a strongly supported variant call. So is using default fine or based on what information should I decide its value? Furthermore, is there an easy way to visulaize/check if the coverage is consistent across the genome?
Due to very uneven coverage of the 'genome' by RNA-seq this 100M is more commonly referred to as 'number of reads' rather than coverage.
You are using RNA-seq, so per definition the coverage is not consistent across the genome. If that's something you wonder about I guess you need to read more about RNA-seq before continuing this analysis.
Thank you for your answers. I am aware that coverage won't be consistent because genes expressing higher will have higher stack of reads aligned to them. I wanted to ask how high should I choose the threshold to capture the highly expressed genes as well without landing to any technical issue. But I got my answer. Thank you very much.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Another comment would be why are you using RNASeq to do variant calling? DNA reads are generally used.
I am trying to analyse allele specific expression.I have DNA sequence as well..will check for concordance/discordance between the data I will be getting from both variant calls.
There are specialized tools for allele specific expression, I would suggest to give those a spin.