How to remove introns in a specific nucleotide sequence

0

Entering edit mode

3.2 years ago

LeeLee ▴ 10

I am using ribotricer software to search for ORF in ribo-seq data. This software will return the start and end positions of the ORF on the genome as below:

ENST00000327044.7_944697_959240_2247 chr1 -

The information in the above line is, in order, gene information, gene start position, gene end position, ORF length, chromosome, positive and negative chain. I want to use it to predict the amino acid sequence, but I found that the nucleotide sequence obtained in this way has introns. How can I pass it? Use annotation files to remove introns in such a sequence?

nucleotide • 3.7k views

ADD COMMENT • link updated 3.2 years ago by Emily 24k • written 3.2 years ago by LeeLee ▴ 10

0

Entering edit mode

That's a transcript ID and the transcript start and end, not the gene.

ADD REPLY • link 3.2 years ago by Emily 24k

0

Entering edit mode

Yes this is the transcript ID. But the start and end, I understand it refers to the position on the chromosome, because 959240 minus 944697 is significantly greater than the ORF length of 2247

ADD REPLY • link 3.2 years ago by LeeLee ▴ 10

0

Entering edit mode

I guess what is better is to take the coordinates for CDS from the annotation file, splice the transcript sequence using those coordinates (in case of multiple CDS) and patch them together to create an entire transcript CDS

ADD REPLY • link 3.2 years ago by manaswwm ▴ 550

0

Entering edit mode

Yes, this is a good idea, but it is still difficult for me to achieve. Because the start of ORF may not be on the first exon, I don't know how to process it in batches.

ADD REPLY • link 3.2 years ago by LeeLee ▴ 10

0

Entering edit mode

From what I understand, the start of the ORF will be the first CDS (not the first exon which may contain UTR). You can make a simple check by checking if the start codon is at the beginning of the first CDS.

ADD REPLY • link 3.2 years ago by manaswwm ▴ 550

0

Entering edit mode

Are you looking to get the sequence from your sequencing data or the reference sequence for this ORF?

ADD REPLY • link 3.2 years ago by Emily 24k

0

Entering edit mode

Yes, in more detail, what I want to do is to get the 2247 nucleotide sequence of the exon region from 944697 to 959240 on the minus strand of chromosome 1. The point is how to do such a large amount of processing, which is very difficult for me, I am looking for whether there is a suitable tool to do it.