Snps Location Annotations
4
2
Entering edit mode
13.6 years ago
J.F.Jiang ▴ 930

hello,biostar members:

i have collect about 2000SNPs in my SNPs study. recently i came across a paper, in which the snps are seperated into several parts, that is coding area, 3'UTR, 5'UTR, intergenetic, promoter, TFBS, miRNA, enhancer...

so i want to know how can i use this kind of classification to handle with my SNPs set?

if there is anyone know how to do it, could you offer me a database or scripts to do this jobs?

thank you !

snp annotation • 5.5k views
ADD COMMENT
1
Entering edit mode
13.6 years ago
Travis ★ 2.8k

If I understand correctly, this could well be what you are after:

http://snpeff.sourceforge.net/

Also this:

http://www.openbioinformatics.org/annovar/

I'm a newbie and plan to try both but haven't gotten around to it yet.

ADD COMMENT
0
Entering edit mode

I have looked up into the two databases, it seems that i did not clearly declare the problem. the 2000SNPs i collected is not the one from NGS platform or array. It is just those i get from papers and databases. So the format of my file only contains the SNPs rs# number, chr#, position, alles, nothing else.

Thus, i want to quickly know which part are they. But it seems that the first database is what i want, i will carefully checked it to see if it is what i want.

ADD REPLY
1
Entering edit mode
13.4 years ago

considering human genome annotation (you will have to pay attention to the human genome version you've worked with, in order to select the appropriate annotations), I would rather go for any of these 2:

  1. if you are looking for a local tool which would annotate your variants by locally download each needed database and then process it, then I would go for ANNOVAR. it is reusable, so it's the best option if you are planning to annotate often or to include it into your own variant detection pipeline. it is also the most complete option we've so far found, and the one we are currently using at our lab.
  2. if you are willing to send your variants to an online web service just to retrieve the annotated results, then I would go for SeattleSeq Annotation. it is fast and simple to use, yet the annotation provided is quite dense.

these 2 are valid options for the thousands of variants coming out of a NGS experiment, so I'm pretty sure that if you format your SNP list into a valid format which any of these 2 programs accept then you will end up annotating your SNP list easily.

ADD COMMENT
1
Entering edit mode
9.9 years ago

One might start with GFF-formattted GENCODE annotations:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz | gunzip --stdout - > gencode.v21.gff

Using the feature ontology defined here, one can segregate GFF annotations by feature type (see: http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.217). Feature types include keywords like three_prime_UTR, promoter, etc. We can grab a sorted listing of feature types to automate this process. For example:

$ wget -qO- http://song.cvs.sourceforge.net/viewvc/song/ontology/sofa.obo?revision=1.217 | grep '^name:' | sed 's/name: //' | sort > gff_feature_types.txt

We can then segregate the GENCODE annotations by feature type:

$ while read feature_type; do grep ${feature_type} gencode.v21.gff > feature.${feature_type}.gff; done < gff_feature_types.txt

Let's assume that you have your variants in a VCF-formatted file called variants.vcf. Let's convert it to BED with vcf2bed:

$ vcf2bed < variants.vcf > variants.bed

For each smaller annotation file that is of non-zero size, we can convert its annotations to BED elements with gff2bed. We then perform set operations against the variants, separating them into per-feature-type categories based on one or more bases of overlap with the annotation subset:

$ find . -name feature.*.gff ! -size 0 -exec bedops --element-of 1 variants.bed <(gff2bed < {}) > variants.{}.bed \;

Each non-empty file variants.*.bed contains variants that overlap a GENCODE v21 feature by its feature type.

ADD COMMENT
0
Entering edit mode

I am trying to use the above code to annotate a few sites I have. I am able to get all the feature.${feature_type}.gff files, but the last line shows some error below.

find . -name feature.*.gff ! -size 0 -exec bedops --element-of 1 sample1.bed <(gff2bed < {}) > variants.{}.bed \;

-bash: {}: No such file or directory
find: paths must precede expression: feature.coding_region_of_exon.gff

Any suggestion how I can modify the code? Thanks

ADD REPLY
0
Entering edit mode

when the -exec option gets complicated I find it easier to build a for loop. in fact I always try to code as visual as possible in order to quickly review it when needed.

for file in `find . -name feature.*.gff ! -size 0`; do
  cat $file \
  | gff2bed \
  | bedops --element-of 1 sample1.bed \
  > variants.$file.bed
done
ADD REPLY
0
Entering edit mode

Perhaps try wrapping the command in tick marks:

find . -name feature.*.gff ! -size 0 -exec 'bedops --element-of 1 sample1.bed <(gff2bed < {}) > variants.{}.bed \;'
ADD REPLY
0
Entering edit mode
13.4 years ago

I think PLINK is well suited for your problem, look at SNP annotation

ADD COMMENT

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6