retrieving attributes of a list generated by read.fasta()
1
2
Entering edit mode
7.2 years ago

Hi All, I am new to R and I struggle with the following: I generated a list with a MusTr <- read.fasta(file = "Mus_musculus.GRCm38.cdna.all.fa", as.string = TRUE) function, which looks like this

$ENSMUST00000177564.1
[1] "atcggagggatacgag"
attr(,"name")
[1] "ENSMUST00000177564.1"
attr(,"Annot")
[1] ">ENSMUST00000177564.1 cdna chromosome:GRCm38:14:54122226:54122241:1 gene:ENSMUSG00000096176.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:Trdd2 description:T cell receptor delta diversity 2 [Source:MGI Symbol;Acc:MGI:4439546]"
attr(,"class")
[1] "SeqFastadna"

$ENSMUST00000196221.1
[1] "atggcatat"
attr(,"name")
[1] "ENSMUST00000196221.1"
attr(,"Annot")
[1] ">ENSMUST00000196221.1 cdna chromosome:GRCm38:14:54113468:54113476:1 gene:ENSMUSG00000096749.2 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:Trdd1 description:T cell receptor delta diversity 1 [Source:MGI Symbol;Acc:MGI:4439547]"
attr(,"class")
[1] "SeqFastadna"

$ENSMUST00000179664.1
[1] "atggcatatca"
attr(,"name")
[1] "ENSMUST00000179664.1"
attr(,"Annot")
[1] ">ENSMUST00000179664.1 cdna chromosome:GRCm38:14:54113468:54113478:1 gene:ENSMUSG00000096749.2 gene_biotype:TR_D_gene transcript_biotype:processed_transcript gene_symbol:Trdd1 description:T cell receptor delta diversity 1 [Source:MGI Symbol;Acc:MGI:4439547]"
attr(,"class")
[1] "SeqFastadna"

$ENSMUST00000178537.1
[1] "gggacagggggc"
attr(,"name")
[1] "ENSMUST00000178537.1"
attr(,"Annot")
[1] ">ENSMUST00000178537.1 cdna chromosome:GRCm38:6:41533201:41533212:1 gene:ENSMUSG00000095668.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:Trbd1 description:T cell receptor beta, D region 1 [Source:MGI Symbol;Acc:MGI:4439571]"
attr(,"class")
[1] "SeqFastadna"

I would like to retrieve the attributes of each element (e.g. as a vector). For a single element of the list attr works (here to retrieve "Annot" attribute):

attr(MusTr$ENSMUST00000196221.1, "Annot", exact = FALSE)
[1] ">ENSMUST00000196221.1 cdna chromosome:GRCm38:14:54113468:54113476:1 gene:ENSMUSG00000096749.2 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:Trdd1 description:T cell receptor delta diversity 1 [Source:MGI Symbol;Acc:MGI:4439547]"

How do I achieve the same for multiple/all elements of the list? Thanks in advance for any suggestions.

RNA-Seq R • 3.8k views
ADD COMMENT
0
Entering edit mode

The Ensembl perl API is the tool to use for this kind of job. Any particular reason you need to use R for this ? If you have to, check the biomaRt package or the mygene package. They would give you access to some annotations but are not as flexible or comprehensive as the Ensembl API.

ADD REPLY
0
Entering edit mode

I would (potentially) find R helpful in manipulating FASTA files, through converting them to data frames, and then easy exporting with write.fasta. Otherwise, no particular reason. Thanks for your suggestion.

ADD REPLY
2
Entering edit mode
7.1 years ago

little bit late. Try the following in R: I guess you are using seqinR for reading fasta file.

$ lapply(MusTr, function(x) attr(x, "Annot", exact = FALSE)) or $ getAnnot(MusTr)

ADD COMMENT

Login before adding your answer.

Traffic: 1375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6