I'd like to visualize the impact of the duplicates in my NGS/Haloplex data ( With Haloplex, you'll get a large number of duplicates -- see Haloplex & Allele Calling )
I've extracted the number of read-pairs for each duplicate/INTERVAL (chrom:start-end) and for each sample (1.bam, 2.bam )
#INTERVAL MAX MEAN 1.bam 2.bam 3.bam 4.bam 5.bam 6.bam 7.bam 8.bam 9.bam 10.bam ....
I1 4059 120 0 120 4059 168 151 75 173 165 106 211 8 74 95 356 144 125 98 427 81
I2 2490 78 0 90 2490 41 28 28 129 73 45 110 65 39 45 160 56 72 40 152 43 74 96 6
I3 61 1 0 0 19 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 42 0 0 0 4 0 0 0 0 0 0 0 0 0 1
I4 2798 140 0 90 2798 94 86 60 149 97 102 152 158 65 63 225 73 93 46 261 58 76 4
I5 4405 142 0 65 2946 113 58 28 190 104 107 143 73 63 81 266 108 79 60 236 44 65
I6 10 0 0 1 10 0 0 0 2 0 0 3 0 0 0 1 1 1 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1
I7 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0
I8 1204 32 0 49 1204 18 15 9 60 21 18 79 11 18 20 70 38 27 27 75 21 9 14 2 70 9
I9 112 1 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 112 0 0 0 37 0 0 0 0 0 0 0 0 0
(...)
I've uplodaded the data (6Mo) at: https://dl.dropboxusercontent.com/u/18871518/dup.tsv.gz
I'd like to see whether the number of read-pairs is homogeneous between each samples. Could you suggest a method to visualize that information ?
I'm not a #R programmer. I tested this (a heatmap):
T <- read.csv("in.tsv", sep="\t",header=TRUE)
T <- T[order(T$MEAN),]
T <- T[,4:ncol(T)]
M <- data.matrix(T)
png("out.png",width=1000,height=2000)
H <- heatmap(M, Rowv=NA, Colv=NA, col = cm.colors(5000), scale="column", margins=c(5,10),verbose=TRUE)
dev.off()
but I only see one color.