Total number of isoforms in fasta file
0
0
Entering edit mode
2.2 years ago
pentro • 0

In the protein fasta files I downloaded from the human NCBI genome, I got the longest isoform for each gene from R with the help of the code below.

library(Biostrings)

fasta.s <- readDNAStringSet("homo.faa")

names.fasta <- names(fasta.s)

gene.iso <- sapply(names.fasta,function(j)
cbind(unlist(strsplit(j,'\\.'))[2:3]))

gene.iso.df <- data.frame(t(gene.iso)) colnames(gene.iso.df) <-
c('gene','isoform')

gene.iso.df$width <- width(fasta.s)

gene.iso.df.split <- split(gene.iso.df,gene.iso.df$gene)

best.id <- sapply(gene.iso.df.split,function(x)
row.names(x)[order(x$width)[1]])

fasta.s.best <- fasta.s[best.id]

writeXStringSet(fasta.s, filepath='homo_isoform.faa')

But how can I find out how many isoforms there are in total?

R fasta • 597 views
ADD COMMENT
1
Entering edit mode

You can use sapply as you used to get the longest isoform, only here you want to get the length of each list element so something like:

 n.of.iso <- sapply(gene.iso.df.split, length)

should work

ADD REPLY

Login before adding your answer.

Traffic: 2294 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6