Tool:Excel add-in for DNA/RNA string manipulations
0
4
Entering edit mode
8.7 years ago
jgbaum ▴ 140

I recently received some positive feedback for an Excel add-in that I pushed to github some time ago. that performs simple string manipulations (complement, reverse complement, etc.) of DNA & RNA sequences. I know I have the need to do this quite often and I imagine many others do as well. In any case, here is the link:

https://github.com/LJI-Bioinformatics/Excel-Reverse-Complement/blob/master/revcom.vba

sequence DNA RNA Excel • 11k views
ADD COMMENT
3
Entering edit mode

Github often implies source code is available. I was curious to see your reversing algorithm. This file is binary-only, so we can't see what kind of code you're asking us to run. The Excel security prompts also warn about running unsigned code from unknown developers. How can we know it's not a virus?

ADD REPLY
1
Entering edit mode

I agree with Karl, and I apologize to OP in case of him having honest intentions. Unfortunately, the risk nowadays in the presence of modern crypto-malware is too high traded for a trivial implementation. I highly encourage OP to present his works in the form of VBA source code. Unfortunately, as you are a new user, we don't trust you distributing binaries, and I will have to delete the link you provided.

ADD REPLY
3
Entering edit mode

I check the code and it looks fine. It wasn't password protected or anything suspicious.

Function reverse(input_str As String)
    ' reverse a string
    xLen = VBA.Len(input_str)
    rev_str = ""
    For i = 1 To xLen
        getChar = VBA.Right(input_str, 1)
        input_str = VBA.Left(input_str, xLen - i)
        rev_str = rev_str & getChar
    Next

    reverse = rev_str

End Function

Function revcom(input_str As String, Optional ByVal isRNA = 0)
    ' calculate the reverse complement of a DNA/RNA sequence
    revcom = complement(reverse(input_str), isRNA)

End Function

Function complement(input_str As String, Optional ByVal isRNA = 0)

    ' calculate the complement of a DNA/RNA sequence
    input_str = Replace(input_str, "A", "1")
    If isRNA = 1 Then
        input_str = Replace(input_str, "U", "A")
        input_str = Replace(input_str, "1", "U")
    Else
        input_str = Replace(input_str, "T", "A")
        input_str = Replace(input_str, "1", "T")
    End If
    input_str = Replace(input_str, "C", "1")
    input_str = Replace(input_str, "G", "C")
    input_str = Replace(input_str, "1", "G")

    ' now deal with lowercase letters
    ' this could be more elegant
    input_str = Replace(input_str, "a", "1")
    If isRNA = 1 Then
        input_str = Replace(input_str, "u", "a")
        input_str = Replace(input_str, "1", "u")
    Else
        input_str = Replace(input_str, "t", "a")
        input_str = Replace(input_str, "1", "t")
    End If
    input_str = Replace(input_str, "c", "1")
    input_str = Replace(input_str, "g", "c")
    input_str = Replace(input_str, "1", "g")


    complement = input_str

End Function

EDIT: Oop, a lot happened in the 10 min it took to check the binary :P I got the above from inside the binary, so i'm sure it's ok :)

ADD REPLY
3
Entering edit mode

Thanks for inspecting my masterpiece!

ADD REPLY
2
Entering edit mode

Is there any way to see that code without running the binary? I only downloaded it and looked at the file with UNIX 'head' and saw that "xlam" was not something I could safely read.

ADD REPLY
2
Entering edit mode

Unfortunately, I don't think there is a way of looking inside an xlam without opening in Excel. I could be wrong about this, as I'm certainly no expert in MS file formats.

ADD REPLY
2
Entering edit mode

You could run Excel in a VM with no access to the file host system. Thanks for checking anyway, how does one decompile the code by the way? I was trying to check for some strings in a hex editor, but the code looks pretty scrambled.

ADD REPLY
2
Entering edit mode

xlam is just a zip file. If you extract it, it becomes a file structure (varies on version of Excel/Word/etc) with some XML files, and usually a binary "OLE" files which are what MS use to store their junk. OLE file can be obfuscated and password protected (that wasn't the case here), but they can be read in tools like oletools (a python package often used to look for malware in these sorts of office documents) or just by using unix strings / hexdump. As with any binary file, its difficult to know exactly what every bit is doing, on top of the fact it's also difficult to prove a negative (this file has nothing dangerous). But in this instance i think it's ok.

ADD REPLY
1
Entering edit mode

Hello jgbaum!

We believe that this post does not fit the main topic of this site.

Please don't distribute links to unsigned binaries.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
2
Entering edit mode

Thanks for the feedback. I've added the source code to the github repo. Let me know if anything else is required to reopen this thread.

ADD REPLY
1
Entering edit mode

I think you should also remove the binary.

ADD REPLY
1
Entering edit mode

Unfortunately, I think that would make the code useless (or overly cumbersome) for most users. I'm also not aware of any other reasonable way to distribute an Excel add-in, other than as an xlam. FWIW, once the file is loaded the source code is readily viewable in the VBA editor.

One possible workaround would be for me to post the MD5sum of the file and have a trusted user, such as yourself inspect the code. You can of course do this in a sandbox VM. Let me know if that will work or if you have any alternative proposals that don't complicate matters for the end users.

ADD REPLY
3
Entering edit mode

I literally just did all of that. I'm not "trusted" or anything, but i'm pretty confident its OK. There's nothing embedded or anything, just this one script pasted above. If anything, jgbaum should paste the sourcecode, because his binary packs abit too much personal information into a VBA file. It has your name, the fact you're on a mac and your username on the mac, and some otherstuff.

Good old Microsoft.

ADD REPLY
2
Entering edit mode

Thanks for the heads up!

ADD REPLY
2
Entering edit mode

Here is documentation on how to create VBA extensions for MS office from source: https://msdn.microsoft.com/en-us/library/office/gg597509(v=office.14).aspx

ADD REPLY
2
Entering edit mode

Thanks for understanding our paranoia. I think you've done a good job with this tool, because the real problem is not for us programmers to do a revcomp, but for excel users who wouldn't be familiar with VBA. Maybe you can add a little tutorial to how one can create the xlam and import the given VBA. I've also made a VBA snippet for revcomp for some of my co-workers, but I didn't think to use Excel Add-In, and subsequently, the end-users have difficulty getting the code into their projects. If I knew about XLAM maybe we could have upgraded more users.

ADD REPLY
2
Entering edit mode

My apologies, but we need to be vigilant. Indeed I would still recommend turning of all macros in Excel and not use any unsigned binary, not that we are fully protected with a signed binary either... http://researchcenter.paloaltonetworks.com/2016/03/new-os-x-ransomware-keranger-infected-transmission-bittorrent-client-installer/

ADD REPLY
0
Entering edit mode

I wanted to see if anyone had a javascript version of this code so that it could be used in google sheets. I don't know javascript, and the examples I've seen online elsewhere are very context dependent (eg to a particular cell), or were written for a starting input string (eg AATGCTCTG), but aren't flexible for any string.

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6