Entering edit mode
22 months ago
iankeetkumar
•
0
I have a fasta file with CDS of a viral genome. These sequences are in order. By utilising the ids "fig|11292.9703.CDS.1"
Both the problems are separate
1. I want to merge these genes to form a whole genome?
I mean firstly I want to merge all the corresponding CDS into one big genome and then, next fasta file should start like
>Genome_1
ALL THE CDS COMBINED
>Genome_2
ALL THE CDS COMBINED
2. I want to replace their name, which is stored in another text file
Please help me!
The fasta file looks like this.
>fig|11292.9703.CDS.1|
atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg
gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag
ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac
ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac
>fig|11292.9703.CDS.2|
atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg
gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag
ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac
ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac
text file
fig|11292.9703.CDS.1| Name_of_organism
fig|11292.9703.CDS.2| Name_of_organism
If you require any additional information I would be happy to provide.
1) https://man7.org/linux/man-pages/man1/cat.1.html
2) replace fasta headers with another name in a text file ; Renaming fasta headers according to a matching name list ; etc... etc....
There are lots of examples in the search results:
https://www.biostars.org/post/search/?query=rename+fasta+header
https://www.biostars.org/post/search/?query=merge+fasta
You can merge your CDS from a genome by several commands, as suggested by others, but all merged CDS does constitute the whole genome. What about intergenic regions?
By following your concept, you can make an arbitrary sequence that constitutes of all CDS attached side by side.
That's what I want. I was doing the analysis based on wgs, but the staring and end were not matching and it seems the MSA algorithm that I am using was not able to align gene to gene. I always saw a frame shift.