C program to find complementary of DNA sequence
3
1
Entering edit mode
9.1 years ago
Arindam Ghosh ▴ 540

I tried to write this C code to find complementary of DNA sequence. Please verify and help to improve.

#include<stdio.h>

main()
{
    int n;

    printf("\n\nEnter length of sequence:");
    scanf("%d",&n);

    char seq[n], com[n];
    int i;

    printf("\nENTER THE SEQUENCE:");
    scanf("%s",&seq);
    for(i;i<n;i++)
    {
        if(seq[i]=='A')
        com[i]='T';
        else if(seq[i]=='T')
        com[i]='A';
        else if(seq[i]=='G')
        com[i]='C';
        else if(seq[i]=='C')
        com[i]='G';
        else if(seq[i]==' ')
        com[i]='_';
        else if(seq[i]!='T' && seq[i]!='A' && seq[i]!='G' && seq[i]!='C')
        com[i]='*';
    }
    printf("\n\n%s\n\n",com);
    printf("\n\n* is non-DNA bases\n\n");
}
dna c • 32k views
ADD COMMENT
2
Entering edit mode

Inputting DNA sequence by hand.. very convenient.

ADD REPLY
1
Entering edit mode

we can copy paste it atleast................

ADD REPLY
0
Entering edit mode

Just for fun, I made a Python version of the script (outputs reverse complement):

s = raw_input('Enter sequence:\n')

sup = s.upper()

c = ''
for char in sup:
    if char == 'A':
        c += 'T'
    elif char == 'T':
        c += 'A'
    elif char == 'C':
        c += 'G'
    elif char == 'G':
        c += 'C'
    else:
        c += char

print c[::-1]
ADD REPLY
0
Entering edit mode

A 1-liner might look like:

from string import maketrans
print raw_input('DNA: ').translate(maketrans('AaCcGgTt', 'TtGgCcAa'))[::-1]
ADD REPLY
5
Entering edit mode
9.1 years ago
declare 'main(int argc,char** argv)'

don't print messages like "ENTER SEQUENCE:"

read from stdin or from one or more file

read fasta

use switch/case instead of 'if/else' or even faster, a conversion array char compl[UCHAR_MAX];

a C program returns 0 on success

ADD COMMENT
3
Entering edit mode
9.1 years ago

If you want really fast, avoid conditionals and instead declare a static array that uses a character's ASCII decimal value to map to its complementary base:

#include <stdlib.h>
#include <stdio.h>
/*
basemap[] works by storing a very small array that maps a base to
its complement, by dereferencing the array with the ASCII char's
decimal value as the index
(int) 'A' = 65;
(int) 'C' = 67;
(int) 'G' = 71;
(int) 'T' = 84;
(int) 'a' = 97;
(int) 'c' = 99;
(int) 'g' = 103;
(int) 't' = 116;
(int) 'N' = 78;
(int) 'U' = 85;
(int) 'u' = 117;
etc.
for example: basemap['A'] => basemap[65] => 'T' etc.
*/
static const unsigned char basemap[256] = {
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 'T', 'V', 'G', 'H', 'E', 'F', 'C', 'D', 'I', 'J', 'M', 'L', 'K', 'N', 'O',
'P', 'Q', 'Y', 'S', 'A', 'A', 'B', 'W', 'X', 'R', 'Z', 91, 92, 93, 94, 95,
96, 't', 'v', 'g', 'h', 'e', 'f', 'c', 'd', 'i', 'j', 'm', 'l', 'k', 'n', 'o',
'p', 'q', 'y', 's', 'a', 'a', 'b', 'w', 'x', 'r', 'z', 123, 124, 125, 126, 127,
128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255
};
int
main(int argc, const char** argv)
{
char* line = NULL;
size_t len = 0;
ssize_t read;
while ((read = getline(&line, &len, stdin)) != -1) {
for (int idx = read - 2; idx >= 0; --idx) {
fprintf(stdout, "%c", basemap[(int)line[idx]]);
}
fprintf(stdout, "\n");
}
if (line) {
free(line);
line = NULL;
}
return EXIT_SUCCESS;
}
view raw rc.c hosted with ❤ by GitHub

This takes the reverse complement of a string made up of bases {A, C, T, G, U, a, c, t, g, u, N} by reading the input string backwards and printing mapped bases to standard output.

To compile:

$ gcc -Wall rc.c -o rc

To run, as an example:

$ echo 'AGGTCCA' | ./rc
TGGACCT
ADD COMMENT
1
Entering edit mode

FYI: seqtk/bioawk has a more complete complement table. It includes R/Y/D/etc:

https://github.com/lh3/seqtk/blob/master/seqtk.c#L159

ADD REPLY
0
Entering edit mode

Thanks Alex, getline is new to me. Is it just a GNU extension or is it part of a recent standard ?

ADD REPLY
0
Entering edit mode

memory leak: line should be freed

ADD REPLY
0
Entering edit mode

Fixed, thanks. Valgrind suggests the heap is clean:

$ echo -e 'ATTCG\nTTCCA\nGGGAT\nNNaTT' | valgrind -v --track-origins=yes --leak-check=full ./rc
==26625== Memcheck, a memory error detector
==26625== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26625== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26625== Command: ./rc
==26625== 
...
CGAAT
TGGAA
ATCCC
AAtNN
==26625== 
==26625== HEAP SUMMARY:
==26625==     in use at exit: 0 bytes in 0 blocks
==26625==   total heap usage: 1 allocs, 1 frees, 120 bytes allocated
==26625== 
==26625== All heap blocks were freed -- no leaks are possible
==26625== 
==26625== For counts of detected and suppressed errors, rerun with: -v
==26625== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)
ADD REPLY
0
Entering edit mode

answering my question: http://stackoverflow.com/questions/13112784 "First, getline() is not in the C standard library, but is a POSIX 2008 extension. Normally, it will be available with a POSIX-compatible compiler, as the macros _POSIX_C_SOURCE will be defined with the appropriate values. You possibly have an older compiler from before getline() was standardized, in which case this is a GNU extension, and you must #define _GNU_SOURCE before #include <stdio.h> to enable it, and must be using a GNU-compatible compiler, such as gcc."

ADD REPLY
0
Entering edit mode

I didn't have to add an extension to compile under the version of clang that ships in OS X 10.11, so I guess OS X supports that POSIX standard, but on an older Linux box running gcc 4.8.2, I did have to add -std=gnu99 to the build statement. I did not have to add #define _GNU_SOURCE.

ADD REPLY
0
Entering edit mode

Don't forget N and U! :)

ADD REPLY
1
Entering edit mode

Done and done!

ADD REPLY
2
Entering edit mode
9.1 years ago
Alternative ▴ 290

Unless it is an assignment, I do not see why you won't use existing tools. Here is faRc from Kent utils and much more:

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

or

https://github.com/ENCODE-DCC/kentUtils

ADD COMMENT

Login before adding your answer.

Traffic: 3490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6