Entering edit mode
10.5 years ago
Daniel
★
4.0k
I have just received an amplicon dataset but my seqs all follow the same pattern
>M02538:3:000000000-A6UM4:1:1101:10565:1083 1:N:0:0
TGGGGAATCTTGCACAANGGAGGAAACTCTGATGCAGCGACGCCGCGNGAGTGATGAA---------GCGTNGGGAGCAAACAGG
_________________^-----------------------------^-----------------------^
I find this highly irregular. I have aligned it against a reference sequence and the Ns are in the right place to keep consistency so they're not insertions.
Does anyone have any ideas?
Thanks
It could also be a bubble moving through the lane.
It could have been a failed cycle due to a temporary hardware failure (like a camera communication issue). Not unheard of. Looks like it's a MiSeq run so we can't query other lanes on that flowcell.
Can you check a few reads and tell us the quality score you see at the problematic locations?
Sorry, yes it's a miseq.
Here are some fastqc qual charts.
The quality does drop at the 18 and -14(reverse) Ns but not the 48bp one as far as I can see.
No need to apologize--just thinking out loud :)
I originally thought the cycle was a total wipeout due to a temporary sequencer hardware problem, but those FASTQC graphs just imply a crappy cycle. Weird that the scores recover so quickly after the first blip.
Are you positive that every base on every read at those cycle positions is an "N"? This is very important. Because if not every base is an N at that position, then just a portion of the flowcell could have had a problem (a bubble, for example). It's worth loading up the flowcell in Illumina's Sequencing Analysis Viewer and looking at the images for that cycle.
The 48th is merged in with bases 45-49 in your fastqc plot so the drop in quality is not as obvious.
Did you ever solve this problem? Devon and I both suspected a bubble, but I'd like to know for certain.
From talking to the guys who ran the machine we think this is probably what happened but I can't think of any way to confirm without doing another run. I should check with whoever gets the next dataset if they see the same thing... (This was one of their first times running the miseq)
I am just doing as best I can with the data. Because it's so consistent I can just factor it in.
Did you look at sequencing analysis viewer? Using it, you can very easily see if there's a certain region of the flowcell that was problematic. From there you can look at the corresponding thumbnail images. Believe me, if there's a bubble you will see it in the thumbnails!