Hello everyone,
I have a genome fasta file which has 16,941 sequences. Here are example of my "genome.fasta":
>scf7180000026027
GAATGCATACTGCATCGATA
>scf7180000026028
CATAAAACGTCTCCATCGCT
>scf7180000026029
TGCCCAAGTTGTGAAGTGTC
>scf7180000026030
TGCCCAAGTTGTGAAGTGTC
I want to find identical sequences in this genome fasta file, and return their ids. My final purpose are find and remove any identical sequences present in my genome fasta file.
Thank you everyone for any suggestion.
- How To Remove The Same Sequences In The Fasta Files?
- Remove duplicate in fasta file based on sequence
- How to make sure there is no duplicate sequence in a fasta file?