Question

How Can I Obtain Gc Content And Also Find Out About The Sense Information From A Bunch Of Sequences

0

Entering edit mode

10.9 years ago

roll ▴ 350

Hi, I have two questions. I have 1000s of short sequences (around 20bp each)

For each one of them i would like to extract the two following information:

Find whether this sequence is on positive or negative strand
calculate the GC content

Which tool can i use fort his purpose?

Thanks

gc short sequence • 4.7k views

ADD COMMENT • link updated 4.4 years ago by Biostar 20 • written 10.9 years ago by roll ▴ 350

0

Entering edit mode

With biopython is a line to calculate de GC% About the first: how do you want to know it without a reference genome? If you run a blast against it the matches are + the rest are - (if they really are of this reference genome)

ADD REPLY • link 10.9 years ago by Lluís R. ★ 1.2k

0

Entering edit mode

I am using bowtie to get the positions. Is it possible to get the strand information using bowtie? I never used biopython. Is it simply counting GC content and dividing it to the total number of reads?

ADD REPLY • link 10.9 years ago by roll ▴ 350

0

Entering edit mode

I don't know this program but looking to their manual there is a report mode that can tell you in which strand fits each sequence http://bowtie-bio.sourceforge.net/manual.shtml#reporting-modes and this modes also report how many of each nucleotide you have, but there is not a GC% for each read or for all the reads. Bipython is a library of Python addressed specifically to bioinformatics. PS: If you could include that you use Bowtie in the question it would make it easier to answer.

ADD REPLY • link 10.9 years ago by Lluís R. ★ 1.2k

score 2 · Answer 1 · 2014-01-28

Since you mentioned aligning with bowtie, just have it output in SAM format (the standard these days). The orientation of each of the reads will then be encoded in the FLAG field (the second column, the 0x10 bit indicates whether the read is reverse complemented (i.e., on the "-" strand, though whether this has any meaning will depend on the way in which the reads were generated)). For calculating GC content, biopython or even just python (or any other language) can do that. The better question is why you would want to do so.

Since you seem completely new to dealing with sequencing data, you might just outline the experiment and its goals, since we'll be able to give you vastly better advice then.