Entering edit mode
7.5 years ago
biomagician
▴
410
Dear all,
I have tried the following command:
lastz output/genome/ref/seq/chromosome.I.fa output/assembly/celegans/hgap/bristol/contig.fa --notransition --step=20 --nogapped --format=maf > output/AvsB.maf
but LASTZ complains about the dash in my fasta file:
FAILURE: bad fasta character in output/assembly/celegans/hgap/bristol/contig.fa, >contig: -
Does anybody know how to get round this problem?
Best,
C.
Can you post a couple examples of your header lines?
This will remove all dashes:
sed -i 's/-//g' myfasta.fa
What are header lines? Do you mean the header of the FASTA file? There is only one sequence in the file, so only one header: >contig
-i in sed will make a change in the original file, I'd rather go without -i for this case because we dont know what is needed to be done specifically.
why is this post in job section anyways?
Just moved it into the 'Question' section :)
That's why I asked to see some header lines, unless the dash is in the sequences. The OP can also make a copy of the file, or sub-sample to test on.
The output of sed is too long, is this what you want?:
and it goes on for much longer.
Well how did those '-' end up there in the first place...
It's the output of the HGAP assembler. I don't know why HGAP returns gaps in a contig.
You can use sed to replace the file first,
sed -e '/^[agctn]/ s/-/n/g' file1.fa >file2.fa
Additionally you have to use the --ambiguous=n and query.fa[unmask] options since you have lower-case letters.
you need to put it into a new file,
sed -i 's/-//g' myfasta.fa > nodash.fasta
, otherwise it prints to stdout.Not with
-i
...I agree, this would destroy the original file - horrible experiences with sed -i and grep > (instead of grep '>')