I have a bunch of short sequences (60 bp) and for each sequence I would like to extract the position of this sequence, the strand information (whether it is on positive or negative strand) and whether this sequence is transcriptionally active or not.
Is there a way to extract these information?
Thanks Pavel, What about bowtie? Is there any difference whether to use bwa or bowtie for this purpose? I am more familiar with bowtie but not much with the bwa.
well, I guess it really doesn't matter which way you'll get the bed files.
i think i am making some progress. I am running the bwa index now. While waiting for that, I went ahead and run bowtie and converted the sam to bam and then to bed files (using bedtools). My question at this stage is, is there a way to keep the original sequence in the bed file as an extra information? I am going to do this for many sequence and it would be nice to know which one is which.
If you mean like FASTA-formatted sequence? Personally, I don't think so, but you can always extract it from fasta. I guess that you almost there by now. I am not sure how you will run the alignment and if you know that, but bwa will report secondary alignments as well, you may want to remove those before making bed files.
Hi Pavel, I am not sure if i understand you right. I managed to run the bowtie and extracted the chr:start-end and positive and negative strand information as bed file (from sam file). But I do not know which of my sequence corresponds to which entry in the bed file. That is what i meant to ask. I now have the sequences on a different file and information about the sequence in another file but no connection between them. Do you know how to do it?
Wait a second, I though that the fourth column of the bed file actually is the name of your sequence. Did you generate that bed by yourself? Then you can fix that, I guess. I just went to see the what the documentation says about bam2bed and it seems like that column is populated (and fasta is there too?).
no i did not create the bed files myself as i dont know how to extract the strand information from sam files. First i converted the sam into bam using samtools and then bam files to bed using bedtools. I ended up getting bed files with the following format chr1 123 456 0 255 +
I am sorry, I haven't done that particular task you are working with by myself, so I can't provide something like a shrink-wrap script. Hope that now everything is resolved. If not, let's see some of your data and get things working.