Hello,
I have a pileup file like below :
seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
I have to find the gene coverage from this pileup file and if the gene coverage is above a certain "threshhold" coverage, I want to consider that as a gene duplication event.
How can I go about solving this problem ?
The only file that I have is the pileup file. I don't have a BAM file for this.
Thanks. Can I just directly use the 4th column to find the mean for specific regions, rather than looking at the 5th column ?
Let's say I have a gene which covers positions 2,3,4 in the above example. Can I not just add 23+23+23 and divide by 3 ? This will mean I have 23X coverage for this gene, is that correct ?
yes, you can simply use the 5th column and the average coverage is correct.