I have a file with ORF information, including the start position and end position on the chromosome. At first I wanted to create a bed file, and then use the getFastaFromBed of bedtools to get the sequence.
But I found that the sequence obtained in this way includes introns, which makes it impossible for me to predict the amino acid sequence later. I have GTF files containing exon information, but it is difficult for me to process these ORFs in batches. Is there any tool or code that can help me get the nucleotide sequence with intron removal?
AGAT
(LINK). This should fit the bill: https://agat.readthedocs.io/en/latest/tools/agat_sp_extract_sequences.htmlThank you for your reply, but it still does not solve my problem. I can get the same effect through the BED12 file, but I don't need to get the sequence of all exons in the transcriptome, but the sequence of the ORF I am interested in. And these ORFs are usually not in the annotation file (such as the ORF of lncRNA).
You can extract whatever feature you want from your file with AGAT. If you are interested in the ORF feature, just add the option
-t ORF
and-p
to translate it into protein. Don't forget to use the--table
parameter to select the proper codon table (1 by default).