Entering edit mode
4.5 years ago
jiseon824
•
0
Hello
I am new for python and bioinformatics.
for some reason, I have to analyze the data from a massive fasta file.
I want to count to repeat sequence using python.
test.fasta
>1234
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac
>456
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac
>67
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac
>123
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac
>57
gccttctctgggttctcactcagcactagtggagtgggtgtgggctggatccgtaagcccccaggaaaggccctggagtggcttgcactca
>35
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac
>123
gccttctctgggttctcactcagcactagtggagtgggtgtgggctggatccgtaagcccccaggaaaggccctggagtggcttgcactca
>222
gccttctctgggttctcactcagcactagtggagtgggtgtgggctggatccgtaagcccccaggaaaggccctggagtggcttgcactca
Because I am new for Python I couldn't make any code unfortunately. I searched website but I couldn't fine any example code what I can copy and follow.
Does someone can help me to count the duplicate number of sequence?
if I need a reference I can make a file (CSV or fasta)
[what I want is..in csv file] sequence and repeated number
cagatcaccttgaagtcgtctgctcctacgctggtgaaacctacac 5
gccttctctgggttctcactcagcactagtggagtgggtgtgggctggatccgtaagcccccaggaaaggccctggagtggcttgcactca 3
or display ID of reference file and repeated number
ref#1 5
ref#2 3
.
.
Thank you in advance
Are these full length sequences that you want to know if are repeated, or are you interested in the number of occurrences of a specific set of subsequence patterns?
Hi
I want to check the number of occurrences of specific reference sequence in reference file. for example, if i make a reference file as bleow
than, it count the frequency based on the reference. the actual reference sequence is longer then example. it is usually more than 500bp. I've got a fasta file and I have to analyze it to count the sequence reads number based on the reference.
duplicates by sequences or by IDs?
Hi
You should go through this link: https://stackoverflow.com/questions/55226949/how-to-get-the-count-of-duplicated-sequences-in-fasta-file-using-python
You can easily redirect the output to csv or as you want
Thank you so much. it is working well. :) I hope it is working well with my massive data.