Entering edit mode
4.9 years ago
nimishabalan131
▴
20
I have a fasta file (seq.fasta) containing multiple sequences;
>seq1
ATGCGTCTCCCCTTTAGAGAGTTCTCTCTAGCTACGTA
ATTTTTATCGCGCGGGGTGCGACGTTTTTAGGGGGGGG
>seq2
ATCTCTNNNNNNNNNNATATCCCCTTTNNNNNCTCTCT
ATTTTTTTTTCCCCCCGCGCGCGATCGACGCCCCCCCC
>seq3
ATCTCTNNNNNNNNNNATATCCCCTTCTCGGGGCCCCT
NNNNNTTTTTCTCTCTCGCGCTCGTCGAAAAATGCCCC
How to count the frequency of 'N' and the number of positions this pattern has been occurring? (ATCTCT "NNNNNNNNNN" ATATCCCCTTT "NNNNN" CTCTCT).
The result should be No. of occurrences of 'N' and number of positions this pattern has been seen per sequence
Output
seq1,0,0
seq2,15,2
seq3,15,2
($id=seq1, No_of_N's=0, frequency_pattern=0
$id=seq2, No_of_N's=15, frequency_pattern=2
$id=seq3, No_of_N's=15, frequency_pattern=2)
I have changed your post to a Question, as it is asking for help and not providing a Tutorial.
Please can you tell us what you've done so far? Also why do you need this information?
What have you tried? Which programming language do you want to use? Have you searched online for suitable tools?
We are volunteers and want to put you on the right track, but we don't want to invest a lot of our time to provide you with a ready to use solution.
with seqkit, awk and datamash & not printing sequences with zero pattern:
Sounds like a homework assignement.
Use python
count()
method. Or just go through all your letters one by one.