Dear all,
I’m analyzing the GSE207093 (circular RNA microarray dataset) that is generated by Arraystar platform. As I did not find any way for converting the probe Arraystar ID to circAtlas ID, I used blastn of the probe sequence of Arraystar (60 bp in length) against the circular RNA database of circAtlas (sequence length range: 31-2000 bp). But I have a question about the blastn output
I used the below command:
blastn -task blastn-short -query rat_seq.fa -db circ_db -out out.txt -evalue 1e-5 -
max_target_seqs 1 -num_threads 4 -outfmt 6
with the above command, one of my circRNAs of interest (rno_circRNA_014621) matched with rno-Ralgapa1_0048
.
query subject %id alignment length mismatches gap openings query start
query end subject start subject end Evalue bit score
rno_circRNA_014621 rno-Ralgapa1_0048 100 37 0 0 1 37 549 585 1.02E-13 73.8
However, when I used max_target_seqs 10
instead of 1, I obtained 10 sequenced matched with rno_circRNA_014621, which identity, Evalue, and bit score is the same for all of them.
query subject %id alignment length mismatches gap openings query start query end subject start subject end Evalue bit score
rno_circRNA_014621 rno-Ralgapa1_0048 100 37 0 0 1 37 549 585 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0045 100 37 0 0 1 37 549 585 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0019 100 37 0 0 1 37 549 585 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0040 100 37 0 0 1 37 197 233 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0038 100 37 0 0 1 37 204 240 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0050 100 37 0 0 1 37 120 156 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0035 100 37 0 0 1 37 204 240 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0054 100 37 0 0 1 37 549 585 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0028 100 37 0 0 1 37 549 585 1.02E-13 73.8
rno_circRNA_014621 rno-Ralgapa1_0006 100 37 0 0 1 37 549 585 1.02E-13 73.8
I used the below command for extracting the best hit from the above results:
sort -k1,1 -k12,12gr -k11,11g -k3,3gr blastout.txt | sort -u -k1,1 --merge > bestHits
that returned rno-Ralgapa1_0006 as the best hit for the rno_circRNA_014621. My question is why other hits, something like rno-Ralgapa1_0048 or rno-Ralgapa1_0035 is not the best hit? how I can ensure the blast output, actually the obtained circAtlas IDs, are correct?
Sharing any suggestion for obtaining the right CircAtlas Id would be highly appreciated.
Thanks