How to capture the position of SNPs from the gene's locations
1
0
Entering edit mode
3.6 years ago
Kumar ▴ 170

Hi Guys,

I have a question regarding the annotation of SNPs. I have list of candidate SNPs with chr ids (file 1) and the their positions and another file (file 2) I have a gft file with chr id, gene positions (start and stop) and genes names. I am looking to fetch the positions of SNPs (file 1) in between the positions of genes (file 2). I tried bedtools however I could not file the command. Please advise any options to deal with this analysis.

File 1:
chr1    271353586   T
chr1    897272822   C
chr1    913363908   T

File 2:
chr1    271353222   371353586   Gene1
chr1    897272522   897272822   Gene2
chr1    583821554   583821710   Gene3
Bedtools Genome SNPs • 1.3k views
ADD COMMENT
2
Entering edit mode
3.6 years ago

The problem you may have is that the SNP file contains only positions, and bedtools expects regions. If you convert your position file into a bedfile then you could use bedtools intersect to achieve the desired results:

$ cat file1.txt
chr1    271353586       T
chr1    897272822       C
chr1    913363908       T

$ cat file2.bed
chr1    271353222       371353586       Gene1
chr1    897272522       897272822       Gene2
chr1    583821554       583821710       Gene3

$ awk 'OFS="\t"{print $1, $2-1, $2, $3}' file1.txt \
| bedtools intersect -a file2.bed -b - -wao
chr1    271353222       371353586       Gene1   chr1    271353585       271353586       T       1
chr1    897272522       897272822       Gene2   chr1    897272821       897272822       C       1
chr1    583821554       583821710       Gene3   .       -1      -1      .       0

The -wao option writes the original A and B entries plus the number of base pairs of overlap between the two features (which maybe it's useful for you), therefore you'll keep in the output all the information from the input and you'll be able to transform it as you may need.

ADD COMMENT
0
Entering edit mode

Hi, The SNPs file generated from a vcf file and I have the positions only. Is there any way to annotate these SNPs with the gft file?

Thank you,

Manoj

ADD REPLY
0
Entering edit mode

Your question is not clear enough. Do you just want to annotate your snps in file1 with the information in file2, and does the gtf file refer to the file2 in your example? If that'd the case, file1 and file2 should be the first and the second arguments respectively in the bedtools command:

$ awk 'OFS="\t"{print $1, $2-1, $2, $3}' file1.txt \
| bedtools intersect -a - -b file2.bed -loj | cut -f1,3,4,8
chr1    271353586       T       Gene1
chr1    897272822       C       Gene2
chr1    913363908       T       .

The -loj option performs a “left outer join”: for each feature in A it reports each overlap with B, and a NULL feature for B if no overlaps are found. It's similar to the -wao option but without the uneeded overlap base count.

ADD REPLY
0
Entering edit mode

Thank you very much for your help! It works for me.

ADD REPLY

Login before adding your answer.

Traffic: 2313 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6