Hi, I have a FASTA file that contains sequences from a de novo assembly. We have identified rRNA sequences that we are now trying to remove from that FASTA file. I can remove lines with sed but have to do it per line in a script and I have about 500 sequences that I need to remove. Is there a way that I can write this to take the matching sequences from file B (the rRNA sequences) and remove them and the line following (the actual sequence in the FASTA file) from file A? I have tried grep and comm but grep gives me a byte error and comm didn't make any difference to my files.
Any guidance would be greatly appreciated.
safer way using seqkit:
seqkit grep -v -f <fileb> input.fasta
I get it to run successfully and it prints the result to the screen but the result doesn't have anything from fileb removed from it. It's exactly the same as the input. Here is my code: