Entering edit mode
3.8 years ago
diego1530
▴
80
Dear Biostars,
I have this script based on bash to count the ambiguous nucleotides
input="arabidopsis.fasta"
output="ara.fasta"
while IFS= read line
do
echo $line | grep -v '>' | grep -o "[WSKMYRVHDBN]" | sort | uniq -c
echo $line | grep '>'
done < $input1 >> $output
However, as you can see, my script only read one input and in the same folder I have two more multifasta files. My desire is that all three files read at the same time, run the same process and as a result are three different outputs that correspond to each input. I would greatly appreciate any help in modifying my script to achieve my goal
This is because you have supplied only one input file in your code. Try Something like this in the directory where fasta files are located:
This would create new file for each fasta file and new extension ".counts.txt". (for eg. test.fa would generate test.counts.txt". Please format the output.
Suggestions for this loop:
There is another way with seqkit, but it needs a little home work:
All the values are expressed as % against total length of the sequence. You need to convert each value into numbers from percentage and length of the sequence, for each base
Hi, I really appreciate your help and I inform you that your script works very well. Although the results of the ambiguous nucleotides in each file are shown at the bottom without mentioning which sequence had that nucleotide. Is it possible to show me the ambiguous nucleotides in each sequence? I used the first method. Thanks!