Hi, I have two questions. I have 1000s of short sequences (around 20bp each)
For each one of them i would like to extract the two following information:
Find whether this sequence is on positive or negative strand
calculate the GC content
Which tool can i use fort his purpose?
Thanks
With biopython is a line to calculate de GC% About the first: how do you want to know it without a reference genome? If you run a blast against it the matches are + the rest are - (if they really are of this reference genome)
I am using bowtie to get the positions. Is it possible to get the strand information using bowtie? I never used biopython. Is it simply counting GC content and dividing it to the total number of reads?
I don't know this program but looking to their manual there is a report mode that can tell you in which strand fits each sequence http://bowtie-bio.sourceforge.net/manual.shtml#reporting-modes and this modes also report how many of each nucleotide you have, but there is not a GC% for each read or for all the reads. Bipython is a library of Python addressed specifically to bioinformatics. PS: If you could include that you use Bowtie in the question it would make it easier to answer.