How to get the nucleotide sequence through ORF information?
1
0
Entering edit mode
3.2 years ago
LeeLee ▴ 10

I have a file with ORF information, including the start position and end position on the chromosome. At first I wanted to create a bed file, and then use the getFastaFromBed of bedtools to get the sequence.

But I found that the sequence obtained in this way includes introns, which makes it impossible for me to predict the amino acid sequence later. I have GTF files containing exon information, but it is difficult for me to process these ORFs in batches. Is there any tool or code that can help me get the nucleotide sequence with intron removal?

nucleotide r • 1.8k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for your reply, but it still does not solve my problem. I can get the same effect through the BED12 file, but I don't need to get the sequence of all exons in the transcriptome, but the sequence of the ORF I am interested in. And these ORFs are usually not in the annotation file (such as the ORF of lncRNA).

ADD REPLY
0
Entering edit mode

You can extract whatever feature you want from your file with AGAT. If you are interested in the ORF feature, just add the option -t ORF and -p to translate it into protein. Don't forget to use the --table parameter to select the proper codon table (1 by default).

ADD REPLY
0
Entering edit mode
3.2 years ago
xmLiu ▴ 20

if you have the genome fasta file and genome annotation gtf file, you can use gffread to generate cds and pep sequence.

gffread genome.gff3 -g genome.fa -y gene.pep.fa

gffread genome.gff3 -g genome.fa -x gene.cds.fa

ADD COMMENT

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6