Entering edit mode
5.2 years ago
Shixiang
▴
100
I use r biomaRt
package to obtain protein sequence from a gene, however, it will return multiple sequence. How to get the most used sequence of them? Is there another method to get a unique sequence for a gene?
library(biomaRt)
listEnsemblArchives()
listMarts(host = 'http://grch37.ensembl.org')
ensembl = useMart('http://grch37.ensembl.org',
biomart = "ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl")
protein = getSequence(id=c("ABCG1"),
type="hgnc_symbol",
seqType="peptide",
mart=ensembl,
verbose = T)
Peptide
1 MRISLPRAPERDGGVSASSLLDTVTNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
2 MACLMAAFSVGTAMNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
3 MACLMAAFSVGTAMNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGLRKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
4 Sequence unavailable
5 MACLMAAFSVGTAMNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIH
6 MIMRLPQPHGTNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
7 MVRRGWSVCTAILLARLWCLVPTHTFLSEYPEAAEYPHPGWVYWLQMAVAPGHLRAWVMRNNVTTNIPSAFSGTLTHEEKAVLTVFTGTATAVHVQVAALASAKLESSVFVTDCVSCKIENVCDSALQGKRVPMSGLQGSSIVIMPPSNRPLKASAASCTWSVQVQGGPHHLGVVAISGKVLSAAHGAGRAYGWGFPGDPMEEGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGLRKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
8 MLAVQQTEHLPACPPARRWSSNFCPESTEGGPSLLGLRDMVRRGWSVCTAILLARLWCLVPTHTFLSEYPEAAEYPHPGWVYWLQMAVAPGHLRAWVMRNNVTTNIPSAFSGTLTHEEKAVLTVFTGTATAVHVQVAALASAKLESSVFVTDCVSCKIENVCDSALQGKRVPMSGLQGSSIVIMPPSNRPLASAASCTWSVQVQGGPHHLGVVAISGKVLSAAHGAGRAYGWGFPGDPMEEGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGLRKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
9 MLGTQGWTKQRKPCPQNASSYSAEMTEPKSVCVSVDEVVSSNMEATETDLLNGHLKKVDNNLTEAQRFSSLPRRAAVNIEFRDLSYSVPEGPWWRKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGRREMVKEILTALGLLSCANTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQVVSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDSDHKRDLGGDAEVNPFLWHRPSEEDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVRFVLFAALGTMTSLVAQSLGLLIGAASTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYISYVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLYLDFIVLGIFFISLRLIAYFVLRYKIRAER*
HGNC symbol
1 ABCG1
2 ABCG1
3 ABCG1
4 ABCG1
5 ABCG1
6 ABCG1
7 ABCG1
8 ABCG1
9 ABCG1