Hello everyone,
I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format.
My file has only two part: chrom and position. This is how my file look:
CHROM_POS
chr10_100009635
chr10_100187980
chr10_100229692
chr10_100267650
Or more detail file is:
GENE RSID1 RSID2 VALUE
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 0.1736259917762202
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 0.09154263431207886
ENSG00000000457.13 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 0.5075352470673014
Can anyone please tell me how should I convert this format to bed format or maybe I can use some other tool for liftover.
Use a proper title, not a list of comma-separated terms. Read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202
I tried using genome browser but I don;t know how to convert this file format to bed file format.
You have the content necessary for the bed file. Split each line of the first
CHROM_POS
file by_
and repeat the second element twice to get to the basic bed format.Okay, I have one more doubt. In some case there is same position like start and end eg: chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 So is this right? To have same start and end position?
I don't understand your question. Do you mean you have duplicate entries? Did you try extracting the fields you need and actually running them through liftover?
No, I don't have duplicate entries. I know that bed file should be chr, start and end. In my file, its gene followed by rsid which is in form of chrom_pos. So if you look for one gene there is two same rsid.
Yes, I did. It says incorrect format. But I am still confuse as to what should be the stop position
Please read the comment chain - I've mentioned how to get the end position (when the start and end are the same)
That mean it should be chr1:69894240-169894240. Am i right?
This is the file content. I have one doubt. If i repeat second one as stop position then I will only have similar ones as start and end
If i followed the above steps, i will mostly get only same start and stop
Please explain your problem better. What do the four columns mean in your source file, and what are you trying to accomplish using the liftover?
I have two files. one is vcf and other is this model. I want to check number of SNP overlap between these two. But their genome coordinates is different. one is hg19 and other is hg38. So i am trying to do liftover and then find overlap snp.
I was able to convert my vcf files to bed files. But then when I submit it to genome browser it says : Successfully converted 147944 records: Conversion failed on 209 records It was not able to convert for 209 records.
I've had that happen. Not all co-ordinates can be successfully lifted over, I think.
Right, this happens if there are "gaps" in the chain file. These gaps can happen for many reasons - for example an insertion variant that exists in a portion of the population - may be included in one reference genome (in which case the alt allele will be a deletion) and not in the other reference genome (in which case the alt allele will be the insertion). The chain file from the first to second reference will have a gap because there is no mapping for the bases of this insertion.
Thank you for giving an explanation