SNP FASTA CDS sequence acquisition help
0
0
Entering edit mode
8.1 years ago

I am fairly new to bioinformatics and am trying to self teach myself but I have been having trouble acquiring the sequences I want. What I have been trying to do is acquire all variants of a gene's CDS across all mammalian species. First I tried simply BLASTing to acquire my sequences but as I expanded my search, I was getting too much paralog ortholog overlap and I need to separate them from each other, I do intend to look at the paralogs but I need them separate. Next I began trying to learn BiomaRt to write an R script to do what I want, but the data that biomaRt retrieves is in reference sequences and I want all variants from all mammalian species. As for SNP data, it does not give me it in a FASTA of the CDS it just gives me chromosome position, a mutation type and an rfsnp ID, I thought maybe I could put the rfsnp ID into the getSequence function but it does not accept those IDs. I am still trying to learn what the BiomaRt plugin can do or to find some other way to get what I need Can anyone offer some advice? is there an easier way to go about what I am doing or can BiomaRt do something I am unaware of? any advice or solutions would be helpful, especially R advice as I need to work with large amount of data and export them to excel files. Any advice for self teaching myself bioinformatics is appreciated.

SNP FASTA BiomaRt R gene • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6