liftover using genome browser
1
0
Entering edit mode
3.3 years ago
priyanka ▴ 20

Hello everyone,

I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format.

My file has only two part: chrom and position. This is how my file look:

CHROM_POS
chr10_100009635
chr10_100187980
chr10_100229692
chr10_100267650

Or more detail file is:

GENE RSID1 RSID2 VALUE
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 0.1736259917762202
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 0.09154263431207886
ENSG00000000457.13 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 0.5075352470673014

Can anyone please tell me how should I convert this format to bed format or maybe I can use some other tool for liftover.

vcf liftover bed • 4.0k views
ADD COMMENT
0
Entering edit mode

Use a proper title, not a list of comma-separated terms. Read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

ADD REPLY
0
Entering edit mode

I tried using genome browser but I don;t know how to convert this file format to bed file format.

ADD REPLY
0
Entering edit mode

You have the content necessary for the bed file. Split each line of the first CHROM_POS file by _ and repeat the second element twice to get to the basic bed format.

ADD REPLY
0
Entering edit mode

Okay, I have one more doubt. In some case there is same position like start and end eg: chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 So is this right? To have same start and end position?

ADD REPLY
0
Entering edit mode

I don't understand your question. Do you mean you have duplicate entries? Did you try extracting the fields you need and actually running them through liftover?

ADD REPLY
0
Entering edit mode

No, I don't have duplicate entries. I know that bed file should be chr, start and end. In my file, its gene followed by rsid which is in form of chrom_pos. So if you look for one gene there is two same rsid.

ADD REPLY
0
Entering edit mode

Did you try extracting the fields you need and actually running them through liftover?

ADD REPLY
0
Entering edit mode

Yes, I did. It says incorrect format. But I am still confuse as to what should be the stop position

ADD REPLY
0
Entering edit mode

Please read the comment chain - I've mentioned how to get the end position (when the start and end are the same)

ADD REPLY
0
Entering edit mode

That mean it should be chr1:69894240-169894240. Am i right?

ADD REPLY
0
Entering edit mode
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 0.1736259917762202
ENSG00000000457.13 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 0.09154263431207886
ENSG00000000457.13 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 0.5075352470673014
ENSG00000000460.16 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 0.2107198702727749
ENSG00000000460.16 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 -0.03676569950387048
ENSG00000000460.16 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 0.3974601519919186
ENSG00000000938.12 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 0.050964267099090806
ENSG00000000971.15 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 0.4262626847615553
ENSG00000001036.13 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 0.4365424090912025
ENSG00000001036.13 chr6_143501715_T_C_b38 chr6_143511989_A_G_b38 0.38588058145595594

This is the file content. I have one doubt. If i repeat second one as stop position then I will only have similar ones as start and end

ADD REPLY
0
Entering edit mode
chr1 169894240 169894240
chr1 169894240 169891332 
chr1 169891332 169891332
chr1 169661963 169661963
chr1 169661963 169697456 ## also in this case it showing error since start is coming as big than stop.

If i followed the above steps, i will mostly get only same start and stop

ADD REPLY
0
Entering edit mode

Please explain your problem better. What do the four columns mean in your source file, and what are you trying to accomplish using the liftover?

ADD REPLY
0
Entering edit mode

I have two files. one is vcf and other is this model. I want to check number of SNP overlap between these two. But their genome coordinates is different. one is hg19 and other is hg38. So i am trying to do liftover and then find overlap snp.

ADD REPLY
0
Entering edit mode

I was able to convert my vcf files to bed files. But then when I submit it to genome browser it says : Successfully converted 147944 records: Conversion failed on 209 records It was not able to convert for 209 records.

ADD REPLY
0
Entering edit mode

I've had that happen. Not all co-ordinates can be successfully lifted over, I think.

ADD REPLY
1
Entering edit mode

Right, this happens if there are "gaps" in the chain file. These gaps can happen for many reasons - for example an insertion variant that exists in a portion of the population - may be included in one reference genome (in which case the alt allele will be a deletion) and not in the other reference genome (in which case the alt allele will be the insertion). The chain file from the first to second reference will have a gap because there is no mapping for the bases of this insertion.

ADD REPLY
0
Entering edit mode

Thank you for giving an explanation

ADD REPLY
0
Entering edit mode
3.3 years ago
Divon ▴ 230

Another option is to do it the other way around:

Turn your VCF into a dual-coordinate VCF (i.e. a VCF containing coordinates in both hg19 and hg38 concurrently):

genozip --chain hg19tohg38.chain.genozip myfile.vcf

To view your data in hg19:

genocat myfile.vcf.genozip

To view your data in hg38 (and compare to the other file):

genocat --luft myfile.vcf.genozip

More details: https://genozip.com/dvcf.html

ADD COMMENT
1
Entering edit mode

Ohh that great. I will try and do this

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6