Nucleotide Match With Unix Tools
2
4
Entering edit mode
14.2 years ago

Today one assignment in a course I'm doing was finding all (CCG)*4 repeats in the Q arm of human chromosome 11. Since I'm having a little too much time on my hands after having done it with EMBOSS's fuzznuc I wanted to try a bash-only version an came up with

cat 11q.fa | sed '1d' | tr -d '\n' | tr -d '\r' | egrep -io '(CCG){4}' | wc -l

fuzznuc came up with 18 matches, the bash version only with 11. Neither does take into account reverse complement matches by default.

I suppose fuzznuc is correct, but can anyone spot an error in the bash version?

(edit: it's called fuzznuc, not fuzzynuc)

fasta • 2.3k views
ADD COMMENT
0
Entering edit mode

give us the data and parameters to fuzzynuk and the output, then maybe if somebody has way too much time we'll figure something out ;)

ADD REPLY
7
Entering edit mode
14.2 years ago
brentp 24k

I'm not familiar with fuzzynuc, but it probably finds overlapping matches whereas most linux tools will not find overlapping matches (so if you have CCGCCGCCGCCGCCG that's 5 triplets and you actually have 2 distinct (CCG){4}'s but the RE engine will only find the first.

You could sorta check this by seeing how many matches you find with '(CCG){5,}'as the regular expression--though that will similarly underestimate if there are 6+ CCG triplets together.

ADD COMMENT
0
Entering edit mode

I checked it right now, that was indeed the case. Thanks!

ADD REPLY
0
Entering edit mode

sounds good! But then it's kind of fuzzy which result is correct

ADD REPLY
0
Entering edit mode

Yes it is. But it is good to know how the program behaves in case I need it again :-)

ADD REPLY
2
Entering edit mode
14.2 years ago
Michael 55k

Well, I don't know fuzzynuc, so I dont have the slightest glimpse, though I dare to make a guess anyway. fuzzynuc, that rings a bell. Could it be that it does fuzzy matching, while grep does exact matching? To lazy to try to figure it out myself though, because I dont have your data.

ADD COMMENT
0
Entering edit mode

fuzznuc also prints out a matchtable which does only show exact matches, not approximate ones.

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6