Create a fasta and gtf file
2
0
Entering edit mode
4 days ago
ATRX ★ 1.1k

Hi,

Is there a tool to create a fasta file using the long gene length sequence (i.e., TSE to TSS) that includes all the genomic elements, including 5' UTR, exons, introns 3'UTR from hg38? And then create a new gtf file using the fasta file subset and original gtf file information?

Thank you!

gtf human genome fasta • 583 views
ADD COMMENT
2
Entering edit mode
4 days ago
Juke34 8.9k

You may use AGAT.

See here for agat_sp_extract_sequences.pl

Command: agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta -t gene

ADD COMMENT
0
Entering edit mode

This is awesome. Thank you and it worked like a charm!

ADD REPLY
0
Entering edit mode

Juke34 I have another question. Is there a way to extract the longest gene sequence without the introns? For example:

agat_sp_extract_sequences.pl -g infile.gff -f infile.fasta -t exon, but I want the sequence in the gene level and not in the transcript level. Thanks a lot!

ADD REPLY
0
Entering edit mode

Transcript level and gene level is the same when you have only one isoform. If you have several isoforms you may filter them to keep the longest with agat, or merge locations with bedtools intersect

ADD REPLY
1
Entering edit mode
4 days ago
michael.ante ★ 3.9k

Hi, You can use bedtools' getfasta function to extract a sequence from a fasta file, creating a new fasta. Since you work with human, you should have the complete fasta and gene information at hand. Just create a bed file (I guess gtf works as well) with only start and end of your long gene.

bedtools getfasta -fi hg36.fasta -bed myGene.bed -fo myGene.fasta

Then take the original gtf and reduce all coordinates by the one from your bed file (check for 0-based vs. 1-based coordinates in bed vs. gtf ). Adjust the contig name if necessary and load everything into IGV and check if it fits.

ADD COMMENT
0
Entering edit mode

Thank you! This is very helpful.

ADD REPLY

Login before adding your answer.

Traffic: 2058 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6