Dear All,
I have a question, it would be appreciable if someone can give some ideas or suggestions to me regarding my question. I am analyzing exome data for normal samples , its corresponding tumor and its IPS derived from the tumor. I want to know even before calling the variants is there are way I can do a global check on my samples based on the reads to understand whether the IPS resembles the wild type or the tumor samples more or not. Can I make a calling on the reads between the tumor vs wild type and the IPS, based on any parameter that can help me map or enable me to mirror globally where the IPS resemble the wild type more or the tumor exomes. I am not sure how to achieve this on read level. But still I would like to give it a try, if we use the exome bed files provided by the company to see which coordinates lie on the exonic region for each tumor, wild-type and IPS based on the reads and then compare those coordinates to see the overlap of the coordinates having the reads between tumor vs wild type and tumor vs IPS to understand whether the tumor is more close to the wild type or IPS. Is this feasible? If so then I would like to have suggestions about it and how to achieve this? Is there script publicly available to do this?
Thanks
By IPS cell, do you mean induced pluripotent stem cells? I'd be surprised if the process used to induce pluripotency had much of an effect on the underlying sequence (that would kind of defeat the purpose, in fact).
yes , IPS means induced pluripotent stem cells here. The IPS are tumor derived. I have done already the variant calling to check the mutational landscape between the tumor and its IPS by subtracting the normal variants common to both tumor and IPS but I would also like to do it at the read level even before calling the variants, just to understand globally if the IPS resembles more the wild type or the tumor exomes. Can that be done with a help of script on the aligned bam files with the help of exome bed file used for target enrichment provided by the company on read level? Then match for the regions having similar reads between the 3?
If you already have variant calls, which are the key piece of information, I'm not sure what you hope to accomplish here. As I state below, you can take a close look at copy number events, but those are just another type of variant call.
Yes looking at CNV and INDELS will be another level of variant call just to be sure if the variants of both tumor and IPS are similar or not. I have done these at variant level. But am not sure the question I posed is also feasible or not? I want to give it a try if this way of checking the reads for each samples on its corresponding exome region can give me any idea of inferring that the IPS is more close to tumor or wild type or not. As far as I know the tumor is polyclonal and the IPS are mono clonal. So the mirroring will not be too high but atleast I can somehow make a point that my IPS is derived from tumor. I am interested now if such can be derived even before variant detection as I have posted, considering the read in each exome region and match the region between the tumor/wild type / IPS
I suppose you could do a comparison of normalized read depth in the capture regions, which might give you an idea about CNVs at least (i.e., clustering the samples according to this should result in the tumor and iPS samples clustering together). You might also try to look at how the mapping quality varies between the samples (since the tumor and iPS will have more mutations, the mapping quality distribution may differ between them, though this will be aligner dependent). This last possibility has a number of problems, of course.