My ultimate goal is to find malignant cells within all epithelial cells in scRNA-seq data using CONICS algorithm. This method finds large-scale CNVs in each cell based on whole exome sequencing (WES) data. While I work a lot with scRNA-seq data, I have never analyzed WES data. I have matching WES bam files for each sample in scRNA-seq. According to CONICS tutorial, I should use region information from WES with start, end positions and length in each chromosome. They didn't include any WES examples in their tutorials. In absence of WES, they recommend using chromosome position data of the following format:
Idf Chrom Start End Length
1 1 0 248956422 122026459
2 2 0 242193529 92188145
3 3 0 198295559 90772458
4 4 0 190214555 49712061
5 5 0 181538259 46485900
6 6 0 170805979 58553888
7 7 0 159345973 58169653
8 8 0 145138636 44033744
9 9 0 138394717 43389635
10 10 0 133797422 39686682
11 11 0 135086622 51078348
12 12 0 133275309 34769407
13 13 0 114364328 16000000
14 14 0 107043718 16000000
15 15 0 101991189 17083673
16 16 0 90338345 36311158
17 17 0 83257441 22813679
18 18 0 80373285 15460899
19 19 0 58617616 24498980
20 20 0 64444167 26436232
21 21 0 46709983 10864560
22 22 0 50818468 12954788
How can I find that kind of data from WES?
fixed it. thank you!
You can get chromosome sizes for the genome you are working with from reference. See --> chrom.sizes computed locally
thank you. How do I find start and end positions?
Start can be 0 (or 1 if you are using 1 based indexing) and end would be the length of the chromosome. Check what CONICS needs.
In their chromosome position file which I copied in my question the end and length do not match. For CONICS I need a list of regions that have evidence for genomic copy number alterations (derived from exome-seq). How do I obtain those regions' positions from WES bam files?