Partial Cds Extraction From Incomplete Transcripts
4
0
Entering edit mode
12.9 years ago
Raghul ▴ 200

Hi to all, I have 454 RNA transcripts assembled into Isotigs using Newbler. They are complete, they miss 5' end, they miss 3' end and sometimes they miss both (as revealed by blastx). I want only the CDS (coding sequence) because I want to check the propensity of GC (only in CDS) & hydrophilic amino acids (after translation) in my transcriptome. So do you guys know any tool which does this? So all kind of suggestions are welcome!

thank u raghul

cds extraction rna • 5.1k views
ADD COMMENT
2
Entering edit mode

Just to note that by definition, an incomplete transcript cannot yield a CDS. I suppose it can yield a "partial CDS".

ADD REPLY
0
Entering edit mode

I am having the assembled transcripts.

ADD REPLY
0
Entering edit mode

Thanks to all for answering the questions. I think for protein sequence analysis, I can use OrfPredictor. It takes sequences & blastx output for analysis & give predicted peptide sequences in the FASTA formatI am looking ways to identify (partial)CDS in RNA. This answer was available for another question in Biostar forum for a different question.

ADD REPLY
0
Entering edit mode

Thanks to all for answering the questions. I think for protein sequence analysis, I can use OrfPredictor. It takes sequences & blastx output for analysis & give predicted peptide sequences in the FASTA format. Now I am looking ways to identify (partial)CDS in RNA. This answer was available for another question in Biostar forum for a different question.

ADD REPLY
1
Entering edit mode
12.9 years ago

You'll probably want to find the longest ORF and take that as the CDS. You can do a six frame translation and find the longest ORF. Since you might be missing 5' or 3' or both, remember to account for missing stop or start codons.

ADD COMMENT
0
Entering edit mode

The problem here is even after assembled into contigs, many are missing 3 or 5 ends. But still such analysis can be done because no other information is available for this protozoa.

ADD REPLY
0
Entering edit mode

raghul, I think there is not much more you can do about ORFs. use getORFs to find the longest ORF. If the stop codon is missing you will not get the right frame in any case using getORF.

ADD REPLY
1
Entering edit mode
12.9 years ago
Michael 55k

The standard approach for this would be to run a gene prediction software like Glimmer, GeneMark, or Critica on your contigs. Also, try to run a tBLASTx search.

ADD COMMENT
1
Entering edit mode
12.9 years ago
Jack Min ▴ 10

You can write a perl script to extract cds region from your input sequences using the information produced by OrfPredictor as in the definition lines of the output containing the frame value, the start and the end position of the cds.

ADD COMMENT
1
Entering edit mode

I added a function to the OrfPredictor server to generate the CDS file. Please try it out.
Jack Min

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
12.9 years ago

Are these already aligned? If, so you could use bedtools to intersect a bed file containing your coding regions with your bam file:

intersectBed -abam yourbamfile.bam -b cdsfile.bed
ADD COMMENT
0
Entering edit mode

Yes it is already assembled into contigs & it is a protozoa. It has no complete genome sequence available.

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6