Entering edit mode
4.7 years ago
dpearton
•
0
Hello,
I have downloaded a genome assembly from genbank (refseq) and it apparently contains some nucleotides that are not either ACTGN (according to the error file from the radinitio program).
I would like to try and find out what these are prior to fixing the file. I've tried various combinations of grep...
grep -i -v [ACTGN]+ sequence.fas
etc., but they either find everything in the file, or nothing.
I would like to do a "simple" grep that finds lines that contains any characters IN ADDITION to [ACTGN] (either case). I can get rid of fasta headers by piping grep -v '>'
Thanks!
Thank you very much for this. I had tried the caret negation but I did not use the -E so it didn't work.