Well, as far I know some aligners first align the forward read using regular alignment algorithms like @13en mentioned. Then it looks within a range (which is your estimated insert size + some extra just to be sure) and tries to align the reverse strand in that region.
What I also have seen in some tools is that they just align all forward and all reverse reads to the query seperately, then filter our the paired reads and match their coordinates. The coordinates that qualify your arguments (for example insert size is 200 should eliminate all alignments that are more than 200-(read lengths of both reads) apart from each other) and then only pick the ones that are considered proper. And output them in SAM format or something.
If you take the PEAR method, take a read pair, look for overlap, output the PEAR contigs, then align the PEAR contigs to your query sequence.
But it still all depends on what you want to actually accomplish with your software. I would suggest just reading up on what other pair end mappers/aligners do and then deciding what you want to improve. Or if you have a specific research question, then first finding out what you need as output to answer it, then find out what the best course is to answer it.
Best advice I can give is, don't try to reinvent the wheel. Some of these tools have years of research put into them so it won't be easy to just write one on the go unless you really want to put some time into it.
I want to do a program that receives as input a sequence of short read (paired end) and a DNA sequence and I want to make aligenement of these two sequences but I do not know how to do an alignment because I have two sequences of short read (paired end) so I want to know how to make an alignment of two sequences. Because you can make the alignment with two sequences and I have 3 sequences (paired ends and the DNA sequence).
Have you an idea?
Thank you
As tsr640 said, the example reads you gave are reverse compliments of each other not paired-end reads. If that's correct, you could just align them to the reference using something like the Smith-Waterman or Needleman-Wunsch algorithms. Both of those wiki pages give good examples of how sequence alignments work computationally, and then you could just try it out yourself - either coding it in your favourite language, or perhaps using one of the implementations from the EBI.
For paired-end reads, the actual alignment will probably be similar, but there may be checks for distance between pairs, orientation of reads, that sort of thing, because you are expecting the pairs to be coming from the same short fragment of the sample DNA.
Tthank you for your explanation but in reality are paired end reads.
So there are pairs of reads and I my data in pairs. do you have an idea for an alignment of paired end reads?
Thank you
Here's a naive approach that may help you understand what others have already mentioned. I think there are more issues that you need to be aware of when looking at this problem. Suppose you have 1 set of paired-end reads (2 different sequences), then you can align each sequence separately to your reference DNA sequence. After both sequences have been aligned, check to see if the distance between their alignments is less than some threshold value that you specify (this is called the insert size). For example, if two aligned sequences are <100 bases apart from each other, then that is a good alignment. There are smarter ways to do this (as explained by tsr640) but my approach is probably the most basic if you have to implement something for a class. Otherwise look around the internet and read about various read aligners (Bowtie, BWA, Tophat, STAR to name a few). Pick your favorite and learn how to use it well.
I can do an alignment between two sequences is simple.
But my problem is how to make an alignment of a read (paired end) and a sequence as an example:
Here I have a sequence and a read (paired end).
I did not understand how to align a sequence and a paired reads
Thank you
What do you want to know specifically? What type of algorithm is used for alignment? How you write a tool using that algorithm? What tools can do it? What the difference is between aligning paired end vs. single end? How that translates to alignment?
I can maybe explain a little better, or add some sort of figure. No one will give you a custom made script or tool and asking the same question again doesn't really help in understanding what you want to know. Please explain some more what your hurdles are and what you really want to accomplish and why you do not want to use existing tools?
I think you have not understood my question.
I did not need a script and I know there are alignment tools but I'm looking for how to do a simple alignment of two sequences and one of these two sequences is paired end.
Do I take the first strand of the read (paired end) or something else to make the alignment?
Because the alignment is right between two sequences.
Thanks
If you want to know about algorithms used for alignment, read the wiki links I posted previously and Google "sequence alignment tutorial" or something.
You might create less confusion over the question if you presented your reads differently. This:
looks like you're representing an alignment or complementary base pairing (particularly since, you know, they are reverse complements...). Perhaps something more like
would have made it more obvious?