Taking out the no hit sequences.
0
0
Entering edit mode
4.7 years ago

Hello all, I have been working around BLAST+ from last couple of days, I am running a metagenomics data, which I alinged to Cog (Cluster of Orthologus Groups) db, now my ultimate objective is to remove the bacterial sequences so that putative viral contigs remain back.

So the hits from the BLAST I need to remove.

The step which I usually follow to get the hit sequences are: Step1: Remove unwanted things from table

sort -k2,2 -k4,4nr -k3,3n  File.txt | sort -u -k1,1 --merge | awk '{print $2}

This step gives me the accession ID of the sequences that mapped. Now I follow up with the next command to retrieve the sequences from the sequence file by keeping accession ID file as reference.

Step2: Retriving Sequences

sort -u accession.txt | parallel -j1 " grep -A1 {}" file.fa

Now my question is, is it possible in any way, that I use the same accession ID file as reference, but retrieve the sequences, that are not present in the accession ID file.?

Thanks, VISHAL

Assembly sequence alignment • 516 views
ADD COMMENT

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6