Retrieve the sequence based on the start and end position in the cuffmerged.gtf
2
0
Entering edit mode
8.8 years ago
Chao.wang2 ▴ 50

Hi guys,

Is there any one who knowns how to retrieve the gene sequence based on the staring and ending position in the cuffmerged.gtf file. Since there are some genes only tracking Ids and starting and ending positions available. I want to retrieve these sequences and annotate it. I will really appreciate for you guys help.

Thanks a lot

RNA-Seq • 3.0k views
ADD COMMENT
0
Entering edit mode

Thanks very much

Sounds helpful.

I will try it tomorrow.

ADD REPLY
1
Entering edit mode
8.8 years ago
igor 13k

A one-line solution:

bedtools getfasta -fi genome.fa -bed cuffmerged.gtf -fo out.fa

Yes, the -bed parameter can actually take BED/GFF/VCF files. Full documentation here

ADD COMMENT
0
Entering edit mode

yes, this is brilliant

ADD REPLY
0
Entering edit mode

Hi igor,

Thanks for your solution. However I want to extract the sequence corresponding to one cufflink tracking ID, the bedtools getfasta return several exon sequences for each tracking ID, Do you think there is a way to get around that? I also checked the cufflink website, there is a gffread utility.

which was designed to handle the cufflink output, however it extrac transcript sequences based on transcript ID in the cuffmerged.gtf not gene ID, Do you think there is a way to change it?

Thanks very much

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6