Hello,
I am trying to align shotgun metagenomics dataset to NCBI eukaryotic reference database using Blastn to evaluate the dietary assessment from fecal samples of black bear. I am new to this. I've got the blastn output as a tabular format (outfmt 6). From this, I extracted the information regarding unique queries and unique subject sequences to check how often the queries hit the exact spot in the database. I used the follwing commands:
for i in $(ls blastn_out_nt/); do cut -f 1 blastn_out_nt/$i | sort | uniq | wc -l >> query; done
for i in $(ls blastn_out_nt/); do sort -k2,2 blastn_out_nt/$i | cut -f 2,9,10 | uniq | wc -l >> unique_subjects; done
The output looks like:
Does a smaller ratio of unique queries and unique subjects upon blastn results potentially indicate that the input fasta sequences were redundant (pcr duplicates) becasue they hit the same database entry? I appreciate your kind help!
Please do not post screenshots of text data. Use
101010
code button to format your data as code.Don't forget to follow up on your previous threads - You have multiple without accepted answers where you have yourself commented that the solution provided works.
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.
Please do not use
bioinformatics
as a tag unless your post is about the field itself. If you're using it because your question is related to bioinformatics, please understand that every post on this forum is related to bioinformatics.