What is the best way to retrieve gene name from variant mutation?
1
0
Entering edit mode
19 months ago
Lila M ★ 1.3k

Hi all! I was wondering if any of you may suggest a clean a fast way to retrieve gene name from a list of variants that look like this:

CHR_POS_REF_ALT > chr1_33333_A_C

Of course I can generate a file with POS, CHR, REF, ALT and as I am missing some information for a standard VCF file, I was wondering if "VariantAnnotation" from Bioconductor is the correct tool to use.

mutation name gene variant • 1.9k views
ADD COMMENT
0
Entering edit mode

You only really need POS, CHR, REF, and ALT for a VCF file. 8 columns are needed but everything else can be a dot (i.e., .).

ADD REPLY
0
Entering edit mode
19 months ago
tabix tabix_indexed.gtf.gz ` echo "chr1_2077466_A_C" | awk -F '_' '{printf("%s:%s-%s\n",$1,$2,$2);}'` |\
awk -F '\t' '($3=="gene")' | cut -f 9 | tr ";" "\n" |\
grep -w gene_name | cut -d '"' -f 2 |\
sort | uniq | paste -sd,
ADD COMMENT
0
Entering edit mode

Thank you for your feedback! I've never used tabix before, am I right it I assume tabix_indexed.gtf.gz cames with the software?

ADD REPLY
0
Entering edit mode

No, index your GTF file with tabix which is part of htslib. https://github.com/samtools/htslib

ADD REPLY
0
Entering edit mode

ah but I don't have the gtf file

ADD REPLY
0
Entering edit mode

If this is for human (or another model organism) you should be able to get a file based on genome build.

ADD REPLY
0
Entering edit mode

Thank you both for your feedback! what I am doing so far is:

gunzip  Homo_sapiens.GRCh38.109.gtf.gz | awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k4,4n -k5,5n"}' | bgzip > Homo_sapiens.GRCh38.109_sorted.gtf.gz

tabix Homo_sapiens.GRCh38.109_sorted.gtf.gz

tabix Homo_sapiens.GRCh38.109_sorted.gtf.gz  ` echo "chr1_1000000_G_A" | awk -F '_' '{printf("%s:%s-%s\n",$1,$2,$2);}'` |awk -F '\t' '($3=="gene")' | cut -f 9 | tr ";" "\n" |grep -w gene_name | cut -d '"' -f 2 |sort | uniq | paste -sd,

But unfortunately it raises this

usage: paste [-s] [-d delimiters] file ...

Any idea what is causing it?

ADD REPLY
0
Entering edit mode

strange. try (hyphen at the end)

paste -sd, -
ADD REPLY
0
Entering edit mode

Does not work but does not produce any error either. I am not understanding tabix followed by the echo. It is possible the biostar formatting may have eliminated something in your command let me check.

Update: I got a copy of the original command from your answer and it produced no output.

ADD REPLY
0
Entering edit mode

Pierre Lindenbaum Is the command you posted as answer correct/complete? Can you please check? I can't get it to work. What is the expected output?

ADD REPLY
0
Entering edit mode

Hello again, did we reach any conclusion about this? I really appreciate it :) Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6