conversion from genomic to gene coordinates in a large file
1
0
Entering edit mode
3.6 years ago
storm1907 ▴ 30

Hi, I need to convert chromosomal location to gene position in a txt file with 6 columns

1   chr1:183189 0   183189  G   C
1   chr1:609407 0   609407 G   C
1   chr1:609434 0   609434  G   C
1   chr1:609435 0   609435  G   G

to

> genename:gene_position   0   gene_position G   C  
> genename:gene_position    0  gene_position  G   C   
> genename:gene_position    0  gene_position  G   C 
> genename:gene_position    0  gene_position  G   G

Files are made in GRCH38. I looked in this forum for similar issue, but could not find appropriate solution. One option was with bedtools, but unfortunately, only bed files are suitable in that case. One more option included some outdated R packages, which I could not install. A similar thread is here File conversion from coordinates to genes , however the solution included only conversion to gene names, not positions

Thank you!

plink • 1.1k views
ADD COMMENT
0
Entering edit mode
3.6 years ago

convert+sort your file to bed using awk

convert+sort a GFF/GTF file to bed using awk

combine both files using bedtools intersect

ADD COMMENT
0
Entering edit mode

How do I get GFF/GTF file? From sample VCF, or some kind of reference?

ADD REPLY
0
Entering edit mode

Thanks a lot, I found NCBI Assembly download option. This line helped to convert GTF to bed

 cat GCF_000001405.39_GRCh38.p13_genomic.gtf  | awk '{print $1,$4,$5,$10,$6,$7}' | sed 's/"//g' | sed 's/;//g' > out
ADD REPLY
0
Entering edit mode

After i converted plink to bed file as well. Then i tried to run bedtools intersect; got the error about that spaces are needed to be converted to tabs in the converted gtf file; fixed it with

awk -v OFS="\t" '$1=$1'

After running bedtools intersect again, i got this:

***** ERROR: illegal character 't' found in integer conversion of string "transcript". Exiting...

How should I correctly convert bed file, obtained from NCBI GTF reference? The problematic file is here:

enter image description here

ADD REPLY
0
Entering edit mode

Hello, so i did following with last command:

  bedtools intersect -wb -a reference.gtf  -b myfile.bed   > intersected  

and now I have this file type:

1       BestRefSeq      gene    69270   69271   .       +       .       gene_id "OR4F5"; transcript_id ""; db_xref "GeneID:79501"; db_xref "HGNC:HGNC:14825"; description "olfactory receptor family 4 subfamily F member 5"; gbkey "Gene"; gene "OR4F5"; gene_biotype "protein_coding";     1       69270   69270   0       G       G
1       BestRefSeq      gene    69511   69512   .       +       .       gene_id "OR4F5"; transcript_id ""; db_xref "GeneID:79501"; db_xref "HGNC:HGNC:14825"; description "olfactory receptor family 4 subfamily F member 5"; gbkey "Gene"; gene "OR4F5"; gene_biotype "protein_coding";     1       69511   69511   0       G       G
1       BestRefSeq      gene    69761   69762   .       +       .       gene_id "OR4F5"; transcript_id ""; db_xref "GeneID:79501"; db_xref "HGNC:HGNC:14825"; description "olfactory receptor family 4 subfamily F member 5"; gbkey "Gene"; gene "OR4F5"; gene_biotype "protein_coding";     1       69761   69761   0       A       T

i wonder, how can make this output to needed file:

> genename:snp_position   0   snp_position G   C  
> genename:snp_position    0  snp_position  G   C   

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6