from fasta to gtf format?
0
1
Entering edit mode
3.6 years ago
debitboro ▴ 270

Dear Biostars community,

I have a fasta file containing the sequences of a set of genes (with coordinates in the header). The fasta file is obtained after performing de novo assembly by a former colleague. I want to see if those genes are differentially expressed between two samples. Since I don't have the .gtf file of those genes, I don't know how to perform the DE analysis.

What I did:

I performed an alignment of the two samples against the reference genome (the same genome already used to obtain the fasta file containing the genes), and obtained .bam files. To get the matrix of count per read per sample, I need to provide a gtf file to htseq-count. But I don't know how to convert the fasta file to a gtf file?

Any help?

gtf DE fasta • 1.0k views
ADD COMMENT
1
Entering edit mode

I have a fasta file containing the sequences of a set of genes (with coordinates in the header).

If your reference only contained the gene fasta you could simply use samtools idxstats to get read counts for each entry.

ADD REPLY
1
Entering edit mode

I have a fasta file containing the sequences of a set of genes (with coordinates in the header).

If you show some example, we can suggest how to convert them to GFF if possible

ADD REPLY
0
Entering edit mode

I'm going to revive this post.

Assuming that you have a fasta file called Homo_sapiens.GRCh38.dna_rm.toplevel.fa with positional information in the header of each sequence (it's a masked sequences file from ENSEMBL, fyi):

head -10000000 Homo_sapiens.GRCh38.dna_rm.toplevel.fa | sed -n '1,3p; 4149276,4149278p; 8185836, 8185838p'
>1 dna_rm:chromosome chromosome:GRCh38:1:1:248956422:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>2 dna_rm:chromosome chromosome:GRCh38:2:1:242193529:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
>3 dna_rm:chromosome chromosome:GRCh38:3:1:198295559:1 REF
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

How do I convert it to gtf/gff format?

ADD REPLY

Login before adding your answer.

Traffic: 2144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6