Ignoring N's on Each Side of the Chromosome
1
0
Entering edit mode
9.5 years ago

I'm trying to find the instances of a degenerate DNA sequence (contains N's, R's, K's, etc.) in the human genome. I am using the matchPattern function provided in Biostrings. However, when I use matchPattern(pattern, subject, fixed=FALSE) in order to force the interpretation of the IUPAC extended letters as ambiguities, it returns a lot of sequences that are all N's since the beginning and end of the sequenced chromosomes in the human genome contains thousands of N's. Is there any way to ignore those regions or just ignore patterns that are all N's? Thank you very much.

alignment genome R • 1.7k views
ADD COMMENT
1
Entering edit mode

Is there a way to trim away all of the N's for all of the chromosomes in one shot

So you want to remove all Ns? Why not to gsub("N", "", genome)?

ADD REPLY
0
Entering edit mode

Antonio, thanks for the response. I tried use the trimLRpattern but it seems like it will only trim up to a certain amount. For example, if I trim away "NNNN" it only trims the first four and last 4 N's. Is there a way to trim away all of the N's for all of the chromosomes in one shot (also given that the number of N's are variable and I don't know beforehand how many N's there are on either side). Thanks again!

ADD REPLY
0
Entering edit mode
9.5 years ago

The function trimLRpattern of this package can be used for this purpose..

ADD COMMENT

Login before adding your answer.

Traffic: 1712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6