Question

Get FASTA Sequence for comparison from SNPs

0

Entering edit mode

10.8 years ago

Edga • 0

Hi Everybody,

I spent some time over the past few weeks getting my head around my DNA Sequencing results. I checked and read lots of different websites.

At the moment I was able to map all SNPs given to gene names and that gene FASTA Sequence

So far so good. Now let's say I have a gene AGRN, the sequence is 7343 in length. How do I compare the sequence from my results to the Human genome?

I only have 10 SNPs (1 with only -- genotype) that will amount to a sequence of 20 bases. Am I missing something? (well obviously!)

Even if I look up just one SNP in dbSNP, rs6657048, I get a FASTA sequence like this:

TGGTGGCCCG GGAGAGCCTG CTGGA
Y
GGCGGCAACA AGGTGGTGAT CAGCG

but in my results I have CC

I know it's probably basics but I would appreciate some explanation on the matter.

Thank you

SNP Fasta • 4.6k views

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Edga • 0

2

Entering edit mode

At this point, your question isn't clear to me. When you say How do I compare the sequence from my results to the Human genome?, what kind of comparison are you wanting to perform? There are many types of comparisons you can perform, and multiple ways to pursue any of those comparisons.

For example, you could take your sequence for AGRN and Blast it against the human genome to see specific mismatches.

In what format do you have your results? Bonus points if you paste some of that output. That can greatly help us assist you.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Dan D 7.4k

0

Entering edit mode

Thank you Deedee and Jorge Amigo for responding..

My apologies.. I had a suspicion I may have not been very clear. I will try again.

If I wanted to use BLAST, how do I get the whole sequence of that ARGN gene from my genome mapping..

So my SNPs for AGRN look like below:

rs6657048   957640  CC  NM_198576
rs2710888   959842  CC  NM_198576
rs3128126   962210  AA  NM_198576
rs13303147  963661  CC  NM_198576
i6019314    977485  CC  NM_198576
rs2710875   977780  TT  NM_198576
i6019317    979487  AA  NM_198576
i6019318    980659  --  NM_198576
rs13303307  988310  AA  NM_198576
rs2465136   990417  TT  NM_198576

Let's say I'm very interested in this gene and I would like to find out if I have any mutations in its sequence and of course related papers published by scientists can tell me what it could mean.

I know it's a long shot and at the moment I am not looking at anything specific, but I would like to learn how to read such data properly

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Edga • 0

0

Entering edit mode

I still don't understand why you would want the gene's fasta sequence, but you can get it from its NCBI's entry, where the entire gene is thoroughly described.

take into account that if you want to get the sequence around each SNP only you can get it from dbSNP directly.

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Jorge Amigo 14k

0

Entering edit mode

Okay let me rephrase the question, hopefully it will make more sense:

How do I create gene sequence from SNPs that I have to find irregularities and variations in bases compared to the model?

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Edga • 0

Ram · Answer 1 · 2014-10-15

the Y code means that it can be a C or a T, so your homozygous CC is expected. take a look at the IUPAC code for the entire coding posibilities.

It isn't clear to me what data are you using and what exactly are you trying to do with it. if you are willing to compare each SNP's bases with the known human genome variation and you know the rs number for each SNP you are fine comparing your data against dbSNP. if you to compare fasta sequences (what for? just to know if your genotypes match the human genome reference? if so, do the previous) you can always use blast, even on a genome viewer such as IGV (a more visual approach).

Ram · Answer 2 · 2014-10-15

0

Entering edit mode

10.8 years ago

Dan D 7.4k

If I wanted to use BLAST, how do I get the whole sequence of that ARGN gene from my genome mapping?

It's easy to do programmatically, but I don't know of any tool that can do it automatically. Do you know which reference assembly was used for the alignment? I can code up something for you once I have that info.

The first column in each of your tables is a dbSNP identifier. You can quickly see these SNPs overlaid over the reference by using NCBI's GeneView on dbSNP. The link should take you to a view which shows the first SNP in the table. You can then overlay your other SNPs onto that view using the "Find" blank at the top of the visualizer:

For each of those SNPs, dbSNP will have a wealth of information. Just use that identifier as your search query.

ADD COMMENT • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Dan D 7.4k

0

Entering edit mode

That's how far I've gone - I managed to map every SNP to a gene name (if known) in my database. But checking each SNP would be quite time consuming (I know the gene-by-gene checking is not two-minute task) but I thought it would be easier/better to check sequence of the gene and then spot the differences to the model.

Is what I'm trying to achieve even possible? I keep reading about comparison and genome data analysis but I must be missing something...

I don't have to have it automatically, as long as I'm able to put together the whole sequence for a gene and check it against a model that would be fine.. (I can figure it out programmatically later on)

ADD REPLY • link updated 4.3 years ago by Ram 45k • written 10.8 years ago by Edga • 0