Good day, I am trying to identify which sequences did not appear from my list. I have file1.txt which is a list of genes
#gene1
#gene2
#gene3
#gene4
#gene5
I also have file2.fa
#>gene1
ACTAGA
#>gene3
ACATGA
#>gene6
AGATA
I want to be able to identify the genes that are not found in file2.fa based on file1.txt list sample output would be
#gene2
#gene4
#gene5
I tried for i in $(cat file1.txt); do perl -ne '/$i/ && print' file2.fa > output.txt; done
it gives everything that appeared in the list. I tried diffirent iterations to get whats not on the list but I wasn't able to.
Hope someone could help me with this.
Thanks!
I assume those
#
are not in your real data since that will break thefasta
format.yup sorry. It was showing as something else in my laptop before I added the #
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time. You would not need to add#
in that case.Thank you!
Thank you very much genomax. Will do that next time.