Dear all,
My first question is if there is / where to find a referece .fasta file for the HeLa cell line.
I am working with several DNA sequencing samples from HeLa. So far I have used UCSC's hg19 as a reference, however when aligning with bwa I can see a lot of R1 fwd alignments with a lot of soft clipping at the 5' in my particular regions of interest. I am starting to think whether this might be due to structural variation differences between hg19 and HeLa beginning around where I am looking into.
My second question is on how to generate a customised reference fasta file from a standard reference fasta file and variation data. Any publication/software describing how to do this?
There are several 2013 efforts studying variation in HeLa. I thought that perhaps combining hg19 with single nucleotide and structural variations files might do what I need.
Any ideas?
According to the most comprehensive review of the HeLa genome I've seen, it's not a small change between it and hg19. It's a completely rearranged cell line. https://www.ncbi.nlm.nih.gov/pubmed/23550136