Hi! I am trying to analyse some Bisulfite sequencing data files, but I am new to this sequencing analysis mode and I am a bit stuck. I have followed Bismark protocol for the alingment, deduplication and methylation extractor steps. And I am trying to follow the methylkit protocol. I have tried uploading to R the sorted bam files from Bismark, but they are really hevay (50-60GB), so this makes it imposible to load them successfully when using processBismarkAln() because it takes too long. Is there another way to do this step? Do the sorted.bam files need a conversion or something to solve this problem? I have read that it may be posible to use the birmark cytosine report or the bismark coverage file for the analysis on methylkit. But my question is, will the results be the same as if the process will be done with the sorted.bam files? Or using the cytosine report or the coverage file there is a loss of information? Sorry for my ignorance and I will appreciate some recommendations to follow. Thanks in advance, Iraia
I know this an old question but for anyone reading it: the compression is better if the bam file is sorted: I could reduce 12GB files to 4GB. Note that you need to sort by name
samtools sort -n
in order to be able to use other bismark tools (which makes the file a bit bigger than just usingsamtools sort
).