Question

How do I correctly add a plasmid sequence with multiple genes to a gtf as well as fasta file?

0

Entering edit mode

3.5 years ago

a.wa • 0

Hi everyone,

I would like to add the sequence of a plasmid, which is integrated in a cell line that I sequenced, with more than one relevant gene to a fasta as well as gtf file and I'm not sure how to do this correctly.

If I would only be interested in one transcript of let's say GFP, as described in the example under this link (https://groups.google.com/forum/#!msg/rna-star/FGQRotrCB1Q/oQ2annphCQAJ), you would add something like this:

fasta:

">eGFP eGFP sequence"

gtf:

"eGFP AddedGenes exon 1 720 . + 0 gene_id "eGFP"; transcript_id "eGFP";"

But how would this look if I would for example like to annotate two genes from my plasmid, e.g. gene of interest and selection marker, as well as UTRs. Would I handle the whole plasmid sequence as the "gene" in the gtf and the genes I'm interested in as transcripts or would I handle the genes (selection marker and GOI) separately? But then I'm not sure what to add to the fasta files. Would I then just add the sequence of my genes + UTRs to the fasta? But then again I'm not sure what the coordinates of the respective elements would be. Maybe someone could help me by a schematic on how this would be built up correctly.

Thanks a lot!

gtf fasta plasmid • 879 views

ADD COMMENT • link updated 3.5 years ago by Arsenal ▴ 160 • written 3.5 years ago by a.wa • 0

0

Entering edit mode

You can do as your example (except that obviously "sequence" stay bellow the header in fasta).

About UTRs, there is no standardizing about having it or not. There are GFFs/GTFs with lots of features like UTRs/exon/Lnc RNA/pseudogene/etc and others with only region/gene/etc. Also, this can change according to which databases and species we are talking about.

I think that the best thing you can do is look at the files from well-documented databases like Genbank/Ensembl/UCSC/etc, from well-documented models like E. coli, Saccharomyces cerevisiae, Homo sapiens, Arabidopsis thaliana, Caenorhabditis elegans, etc, etc, etc and see what they look like.

ADD REPLY • link 3.5 years ago by Arsenal ▴ 160