Hi all!
I'm doing alignments on sequences trying to determine if the seed is <= 65 between each other. My approach was to compare all sequences between each other and look at the csv file
for i in `ls ./ | grep .fasta$`; do echo $i;bwa mem -k 65 all-sequences.fasta $i > $i.sam;samtools view -bS $i.sam > $i.bam;samtools sort -T $i.k50.bam_sorted -o $i.bam_sorted.bam $i.bam;rm $i.bam $i.sam;samtools index $i.bam_sorted.bam;samtools idxstats $i.bam_sorted.bam > $i.csv;done
The problem is if I put a control in my sequences (sequenceX which contains more than 65 sequences in common with sequence1) I got a match only on sequence X (and not on sequence X and 1). The same if I check 'all-sequences' csv file: 1 match instead of 2 for sequence1 and sequenceX
What can be my problem??
Thank you for your help!
Here is a sample of my sequences
>sequence1
CAATACATATCAACTTCGCTATTTTTTTAATAATTGCAAATATTATCTACAGCAGCGCCAGTGCATCAACAGATATCTCT
ACTGTTGCATCTCCATTATTTGAAGGAACTGAAGGTTGTTTTTTACTTTACGATGCATCCACAAACGCTGAAATTGCTCA
ATTCAATAAAGCAAAGTGTGCAACGCAAATGGCACCAGATTCAACTTTCAAGATCGCATTATCACTTATGGCATTTGATG
CGGAAATAATAGATCAGAAAACCATATTCAAATGGGATAAAACCCCCAAAGGAATGGAGATCTGGAACAGCAATCATACA
CCAAAGACGTGGATGCAATTTTCTGTTGTTTGGGTTTCGCAAGAAATAACCCAAAAAATTGGATTAAATAAAATCAAGAA
TTATCTCAAAGATTTTGATTATGGAAATCAAGACTTCTCTGGAGATAAAGAAAGAAACAACGGATTAACAGAAGCATGGC
TCGAAAGTAGCTTAAAAATTTCACCAGAAGAACAAATTCAATTCCTGCGTAAAATTATTAATCACAATCTCCCAGTTAAA
AACTCAGCCATAGAAAACACCATAGAGAACATGTATCTACAAGATCTGGATAATAGTACAAAACTGTATGGGAAAACTGG
GCAGGATTCACAGCAAATAGAACCTTACAAAACGGATGGTTTGAAGGGTTTATTATAAGCAAATCAGGACATAAATATG
TTTTTGTGTCCGCACTTACAGGAAACTTGGGGTCGAATTTAACATCAAGCATAAAAGCCAAGAAAAATGCGATCACCATT
>sequence2
TGCGCAAGAAGGCACGCTAGAACGTTCTGACTGGAGGAAGTTTTTCAGCGAATTTCAAGCCAAAGGCACGATAGTTGTGG
CAGACGAACGCCAAGCGGATCGTGCCATGTTGGTTTTTGATCCTGTGCGATCGAAGAAACGCTACTCGCCTGCATCGACA
TTCAAGATACCTCATACACTTTTTGCACTTGATGCAGGCGCTGTTCGTGATGAGTTCCAGATTTTTCGATGGGACGGCGT
TAACAGGGGCTTTGCAGGCCACAATCAAGACCAAGATTTGCGATCAGCAATGCGGAATTCTACTGTTTGGGTGTATGAGC
TATTTGCAAAGGAAATTGGTGATGACAAAGCTCGGCGCTATTTGAAGAAAATCGACTATGGCAACGCCGATCCTTCGACA
AGTAATGGCGATTACTGGATAGAAGGCAGCCTTGCAATCTCGGCGCAGGAGCAAATTGCATTTCTCAGGAAGCTCTATCG
TAACGAGCTGCCCTTTCGGGTAGAACATCAGCGCTTGGTCAAGGATCTCATGATTGTGGAAGCCGGTCGCAACTGGATAC
TGCGTGCAAAGACGGGCTGGGAAGGCCGTATGGGTTGGTGGGTAGGATGGGTTGAGTGGCCGACTGGCTCCGTATTCTTC
GCACTGAATATTGATACGCCAAACAGAATGGATGATCTTTTCAAGAGGGAGGCAATCGTGCGGGCAATCCTTCGCTCTAT
>sequence3
ATTTCTGAAAATTTGGCGTGGAATAAAGAATTTTCTAGTGAATCCGTACATGGCGTTTTTGTACTTTGTAAAAGTAGTAG
CAATTCCTGTACTACAAATAATGCGGCACGTGCATCTACAGCCTATATTCCAGCATCAACATTCAAAATTCCTAATGCTC
TAATAGGTCTTGAAACCGGCGCCATAAAAGATGAACGGCAGGTTTTCAAATGGGACGGCAAGCCCAGAGCCATGAAGCAA
TGGGAAAAAGACTTAAAGCTAAGGGGCGCTATACAGGTTTCTGCTGTTCCGGTATTTCAACAAATTGCCAGAGAAGTTGG
CGAAATAAGAATGCAAAAATACCTTAACCTGTTTTCATACGGCAACGCCAATATAGGGGGAGGCATTGACAAATTCTGGC
TAGAAGGTCAGCTTAGAATCTCAGCATTCAATCAAGTTAAATTTTTAGAGTCGCTCTACCTGAATAATTTGCCAGCATCA
AAAGCAAACCAACTAATAGTAAAAGAGGCAATAGTTACAGAAGCAACTCCAGAATATATAGTTCATTCAAAAACTGGGTA
TTCCGGTGTTGGCACAGAATCAAGTCCTGGTGTCGCTTGGTGGGTTGGTTGGGTAGAGAAAGGAACTGAGGTTTACTTTT
TTGCTTTTAACATGGACATAGACAATGAGAGTAAATTGCCGTCAAGAAAATCCATTTCAACGAAAATCATGGCAAGTGAA
>sequenceX
ATTCAATAAAGCAAAGTGTGCAACGCAAATGGCACCAGATTCAACTTTCAAGATCGCATTATCACTTATGGCATTTGATG
CGGAAATAATAGATCAGAAAACCATATTCAAATGGGATAAAACCCCCAAAGGAATGGAGATCTGGAACAGCAATCATACA
CCAAAGACGTGGATGCAATTTTCTGTTGTTTGGGTTTCGCAAGAAATAACCCAAAAAATTGGATTAAATAAAATCAAGAA
TTATCTCAAAGATTTTGATTATGGAAATCAAGACTTCTCTGGAGATAAAGAAAGAAACAACGGATTAACAGAAGCATGGC
TCGAAAGTAGCTTAAAAATTTCACCAGAAGAACAAATTCAATTCCTGCGTAAAATTATTAATCACAATCTCCCAGTTAAA
AACTCAGCCATAGAAAACACCATAGAGAACATGTATCTACAAGATCTGGATAATAGTACAAAACTGTATGGGAAAACTGG
thanks for the answer!! Is there a smart way to highlight all the auxiliary tags if I have numerous sequences to compare?
I guess it depends on what you mean by highlighting. In general, I'd suggest a programmatic solution to something like this.