I'm doing an analysis of sequence features in 18S rRNA sequences. I have downloaded data from SILVA database. Unfortunately, typical vertebrate has more than a single 18S sequence. I know that most of them are considered pseudogenes, but unless I check them one by one, there's no way to guess which one is a canonical (reference) version. I have roughly 1000 sequences to verify from 150 organisms, so I'd prefer not to do it by hand.
Where to find or how to identify canonical version of 18S rRNA?