I have a human genome assembly and I'd like to add a new CDS (coding sequence) as a small scaffold at the end of the genome. I then want to generate a new FASTA, GFF, and transcriptome file with this update.
I've tried using Geneious, but have been struggling with the gff formatting. I'm not sure if that's the best approach, so I'm looking for guidance on the optimal way to achieve this.
Some key questions I have:
- What is the best way to add a new CDS sequence as a scaffold to an existing human genome assembly?
- How can I then regenerate the FASTA, GFF, and transcriptome files to incorporate this new scaffold?
- Are there any particular tools or workflows you would recommend for this type of genome editing and file generation?
Any advice or suggestions would be greatly appreciated.
Thank you in advance for your help!
Thank you Michael. I need to add it, I think, to the genome. It's a plasmid that I've put into the cells, and I want to check its transcription together with the human genes. I need to create a genome.fasta, a GFF, and a transcriptome.fasta file to run Kallisto, since I cannot use multiple references in Kallisto. Am I right?
If it's a plasmid in a transfected cell, I would indeed add the whole plasmid sequence including the insert. Most likely you have gotten a sequence and annotation file for the construct from your provider. If not, I would request one. After you got that file, you should export it to GFF and FASTA. Which tool to use depends on the format. FASTA and GFF files are text files, so you can simply use
cat reference.fasta plasmid.fasta > ref_plasmid.fasta
(same with GFF) and it will likely work.