Hello all, I have been working around BLAST+ from last couple of days, I am running a metagenomics data, which I alinged to Cog (Cluster of Orthologus Groups) db, now my ultimate objective is to remove the bacterial sequences so that putative viral contigs remain back.
So the hits from the BLAST I need to remove.
The step which I usually follow to get the hit sequences are: Step1: Remove unwanted things from table
sort -k2,2 -k4,4nr -k3,3n File.txt | sort -u -k1,1 --merge | awk '{print $2}
This step gives me the accession ID of the sequences that mapped. Now I follow up with the next command to retrieve the sequences from the sequence file by keeping accession ID file as reference.
Step2: Retriving Sequences
sort -u accession.txt | parallel -j1 " grep -A1 {}" file.fa
Now my question is, is it possible in any way, that I use the same accession ID file as reference, but retrieve the sequences, that are not present in the accession ID file.?
Thanks, VISHAL