Hi all,
this question is addressed to those of you that have worked with paired-end amplicon sequencing data from Illumina. So, what I have is a construct of say 400bp that has been sequencing from both sides, with reads of 250 bp each. My question is, when I analyze the data I get, do I regard each pair of reads as one? For example, in the following mapping (created using STAR aligner with the paired-end option):
TEST_ID 163 reference 1 255 1S249M = 178
TEST_ID 83 reference 178 255 250M = 1
you can see that the first read's left-most mapping position was 1 (the "left" side) and the other read of the pair starts mapping at position 178. Obviously these reads overlap in some regions. So, for example, if I wanted to retrieve the sequence of the read TEST_ID for positions 200-220, should I consider both of them separately (i.e. treat them as belonging to distinct reads) if they differ or only use the whole pair whatsoever if the two sequences I get agree with each other?
it's hard to answer if we don't know the type of analysis you're trying to do...
I wonder if STAR is an appropriate choice for aligning amplicons. Unless you are doing cDNA sequencing or something non-standard I guess
bwa mem
would be the most suitable aligner for this job.As Pierre remarked you haven't told us what you want to do with those reads - so we can't answer your question.
Hi to both and thanks for answering!
Basically, what I want your opinion about is the following scenario:
Here, I have aligned both my reads to my reference and, as expected, one of the reads matches the left-most part and one the right-most part. However, since they are 250nts each, there is a part where they overlap (since my reference constract is ~430nts). So my question is, if the sequence in the overlapping region is not identical, as in this example:
I guess we can't be sure if in this position we have T or A. So, should I use both sequences/possibilities separately although the belong to the same read?
Again, it may depend on what is your goal. Variant Calling? A variant caller should consider that these reads are a pair and vote it in some way that there is a mismatch between them. Or the way I prefer is, to merge overlapping paired reads before alignment using bbmerge. This produces longer reads, which is better for mapping, and a base quality/error correction is made in the overlapping part.
fin swimmer
Thank you! I was not aware of bbmerge, I will give it a try ASAP!
Your example on discusses one read pair, but in reality (I hope) your region of interest will be covered by more reads and then it doesn't really matter that for one of the pairs there is a disagreement. The variant caller will figure it out.