Entering edit mode
7.0 years ago
marija
▴
80
Hi, please I need your help.I have a sequence with length 1 000 000 bp. I want to create a function, which return the number of "AT" repeats (% A+T must be greater than 30%) and length must be greater than 500bp. I just don't know how to write the statement that the length must be greater than 500bp. Any help? Thank you
Does
letterFrequencyInSlidingView
from the Biostrings Bioconductor package do what you need to get a summary of the data?I just know how to define that % A+T must be greater than 30%:
Please edit the original post/question to add new information.
If you are parsing a single sequence for >500bp stretches with AT content > 30 % (i.e GC content <70), then what about window size and window overlap? or are you looking for set of sequences?
I want the number of sequences that are greater than 500bp and have AT content > 30%. E.g in chr2:1,000,000-2,000,000 are 456 sequences with AT > 30% and length > 500bp (just example).