Is variant calling done on a per-position basis? I've read recommendations to split the BAM file by chromosomes and parallel call the chromosomes for speed
Could I further segment each chromosome into chunks and do calling on each chunk? For example if I: 1) Split a chromosome into 1MB segments. 2) Parallel variant call each 1MB segment. 3) Concatenate the VCF.
Would the resulting concatenated file be correct? Would I be missing any information that might be shared among sites on the same chromosome that variant callers use?
If you plan on calling SNPs only then I don't see a problem. However, if you are looking for structural variants, there would be missing data on the edges of the chunks, especially for SVs spanning multiple chunks. Additionally, how do you plan to keep track of the size or exact position, additional liftUp files?
Thanks for the reply. I see what you mean with the SVs and possibly even indels. I am not interested in SVs for now, but do want to preserve indel information if I can.
The BAM files I am working with are low coverage. I guess I'll to write a script to chunk the BAM file based on coverage "islands" where each island should be at least 100bp apart.