Dear Biostars community,
I have a fasta file containing the sequences of a set of genes (with coordinates in the header). The fasta file is obtained after performing de novo assembly by a former colleague. I want to see if those genes are differentially expressed between two samples. Since I don't have the .gtf file of those genes, I don't know how to perform the DE analysis.
What I did:
I performed an alignment of the two samples against the reference genome (the same genome already used to obtain the fasta file containing the genes), and obtained .bam files. To get the matrix of count per read per sample, I need to provide a gtf file to htseq-count. But I don't know how to convert the fasta file to a gtf file?
Any help?
If your reference only contained the gene fasta you could simply use
samtools idxstats
to get read counts for each entry.If you show some example, we can suggest how to convert them to GFF if possible
I'm going to revive this post.
Assuming that you have a fasta file called
Homo_sapiens.GRCh38.dna_rm.toplevel.fa
with positional information in the header of each sequence (it's a masked sequences file from ENSEMBL, fyi):How do I convert it to gtf/gff format?