I would like to know if there are tools or methods for adding a specific sequence into a fastA file at a specific position.
I would like to modify the genome, by adding a specific sequence in a specific location on the genome.
thanks
Assa
I would like to know if there are tools or methods for adding a specific sequence into a fastA file at a specific position.
I would like to modify the genome, by adding a specific sequence in a specific location on the genome.
thanks
Assa
I did this script for one of my project some years ago. It allows you to look at bases at specific positions but also to modify your genome using another fasta file for the changes.
python lookmod_genome.py -m modify -f file.fasta -c changes_file.fasta
The changes_file.fasta
should contains all changes you want to make using 4 key words (insertion, deletion, add and remove) and should looks like this :
>insertion:chr1:25:26
GCTAGCTAGC
>deletion:chr4:40:50
>add:your_chromosome_name
GTCGATCGTCATGGTT
>remove:your_chromosome_name
I used it quite a lot but it's self made so check the result with the look
mode
Use Python 2 or modify the script to be Python 3 resilient
Thanks for the script. It seems not to work for me though.
I have tried all three options for lookmod.py
. see below. Only the look
options works. I used a test fastA file as input.
$ python lookmod_genome.py -m look -p GL988041:10:20 -g test -r test/test.fa
-----------------------------------------
Mode : look
Fasta file : test/test.fa
Position : GL988041:10:20
Surroundings length : 10
-----------------------------------------
----------------------
------ LOOK ------
----------------------
-> GAAAAAAAAA**GCCGTGCCGT
$ python lookmod_genome.py -m modify -g test -r test/test.fa -c test/insertion.fa
-----------------------------------------
Mode : modify
Fasta file : test/test.fa
Construction file : test/insertion.fa
Output file rename: test.fa
-----------------------------------------
Traceback (most recent call last):
File "lookmod_genome.py", line 499, in <module>
main(sys.argv[1:])
File "lookmod_genome.py", line 474, in main
for key, value in added_dict.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
$ python lookmod_genome.py -m modify -g test -r test/test.fa -c test/insertion.fa -i test/output.txt
-----------------------------------------
Mode : modify
Fasta file : test/test.fa
Construction file : test/insertion.fa
Output file rename: test.fa
Output information File : test/output.txt
-----------------------------------------
Traceback (most recent call last):
File "lookmod_genome.py", line 499, in <module>
main(sys.argv[1:])
File "lookmod_genome.py", line 474, in main
for key, value in added_dict.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
$ head test/test.fa
>GL988041 dna:supercontig supercontig:CTHT_3.0:GL988041:1:6909506:1 REF
GAAAAAAAAAAAAAAAAGAGCCGTGCCGTAGCCCAGTTTTGAACTCTGAAGCCAGATCAG
ACGCGGGATGCAGGAGACCTGGGTGCGGGAGGTGCGGCAGCTGGCCCAAACGGTGCTGCA
GACCTTTTTGCATTGAAGCCCATTTTCACATCCTCTTTTCGTTCTTCCTCCGTCTCCTTC
Do the strings in the chromosome name need to have a certain structure? are spaces or symbols allowed in the names?
I can't figure out what dict
it looks for.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Not tested myself but
seqkit mutate
https://bioinf.shenwei.me/seqkit/usage/#mutate seems to do that.Yes it supports this and you can choose which chromosomes/sequences to insert.