Question

Ignoring N's on Each Side of the Chromosome

0

Entering edit mode

9.5 years ago

msmithmailbox • 0

I'm trying to find the instances of a degenerate DNA sequence (contains N's, R's, K's, etc.) in the human genome. I am using the matchPattern function provided in Biostrings. However, when I use matchPattern(pattern, subject, fixed=FALSE) in order to force the interpretation of the IUPAC extended letters as ambiguities, it returns a lot of sequences that are all N's since the beginning and end of the sequenced chromosomes in the human genome contains thousands of N's. Is there any way to ignore those regions or just ignore patterns that are all N's? Thank you very much.

alignment genome R • 1.7k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.5 years ago by msmithmailbox • 0

1

Entering edit mode

Is there a way to trim away all of the N's for all of the chromosomes in one shot

So you want to remove all Ns? Why not to gsub("N", "", genome)?

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.5 years ago by PoGibas 5.1k

0

Entering edit mode

Antonio, thanks for the response. I tried use the trimLRpattern but it seems like it will only trim up to a certain amount. For example, if I trim away "NNNN" it only trims the first four and last 4 N's. Is there a way to trim away all of the N's for all of the chromosomes in one shot (also given that the number of N's are variable and I don't know beforehand how many N's there are on either side). Thanks again!

ADD REPLY • link 9.5 years ago by msmithmailbox • 0

score 0 · Answer 1 · 2015-06-15

0

Entering edit mode

9.5 years ago

Antonio R. Franco ★ 5.2k

The function trimLRpattern of this package can be used for this purpose..

ADD COMMENT • link 9.5 years ago by Antonio R. Franco ★ 5.2k