Entering edit mode
3.6 years ago
boczniak767
▴
870
Hi,
thanks to extremelly useful page gem-mappability I know how to calculate this measure for each base of my sequence. It looks nice in IGV
as transposons have zeroes.
The question is, how could I get one value (percentage) of uniquelly mappable genome?
I need this value for peak-caller (Homer
, MACS
).
Guess you could count bases that have a
non-zero
mappability value and use that number. This likely does not need to be absolutely precise.It sounds good, but I have variable step
wig
so counting of lines with simple commands likeawk
ls
won't work. Are there any program which returns statistics forwig
files?Ok, in the end I followed advice on MACS2 page
So I've determined overall genome size based on
fasta
file:grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | wc -l
Similarly I've determine the number of
N
s:grep -v '>' Zm-B73-REFERENCE-NAM-5.0.fa | sed -e 's/\(.\)/&\n/g' | grep 'N' | wc -l
Finally, I've counted the overall length of repeats as I have appropriate
gff
file. This step is somewhat arbitrary as there are many kinds of repeats. But in the end it won't affect the output, as I assume from Istvan Alberts' post somewhere at biostars that the first digit in the mappable genome size is most significant.