Question

Multiple N bases in RNAseq

0

Entering edit mode

5.8 years ago

garcesj ▴ 50

Hi there,

I've recently sequenced some samples following a 3'-end RNA-seq protocol. When I watched FASTQs, I realized that multiple reads have Ns (in allmost all processed samples). Initially, I thought in the sample's quality (because I'm using very low amount of cells as input) or in a low diversity because not enough concentration of PhiX... but it's curious because all reads with Ns always contain base number 32 (and those with more Ns, its surrounding bases), here you have an example (only qualities, Ns are #s):

AA6A/AEA/EEEEEAEEEEEEAE#/####E################/#A##E/##E###EEAAE/6E/
AAAAAEEEEEEEEEAEAEEEEEE#E####A################E#/##EE##EE##E/AEEEEE/
/AAAAEEEEEEEEEEEEEEAAAE#EE###E################/#E##EE##/<##EAEE<A/EA
AAAAAEEEEEEEEEEEEEEEEEE#EE###E############A#<#E#E##EEE#EE##<EEEEEEE/
AAAAAEEEEEEEEEEEEEEE/EE#EE###E###E########E#E#E#E##EEEEEE##EEAEEEEAE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EEEEEEEEEEE<AEEEE
AAAAAEEAEEEEEEEEEEEEEEEEAEE#EEE#A/E#EEEE#E<#E#EEE##EEEE<AEE/EEEAEEEE
AAAAAEEEEEEEEEEEEEEE6EEEEEE#EEE#EEE#EEEE#6A#E#A/E##EEEA/EEA/EAEA//A/
AA6AAEE/AAEE/A/EAEEEEEEEEEE#EEE#EE/#EAEA#</#/#A<E##EE/E/E/EEEEEE/</<
AAAAAEEEAEEEEEEEEEEEAEEAEEA/EEE#EAEEEAEAEEEEAEEE/EAA6AAAEA<EEEEA<AEE
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEEEE##EEEEEEA<<EEEAEE/EEEE<AAEAEEEE
AAAAAEEEAEEEEEEEEEEEEEEEAEEEEEE#EEEEE##EEEEEEEEEEEEEEEEEEEEEEEEEEAEE
6AAAAE/EEEEEEEEEEEEEEEE#EE###E############E#A#A#E##EEE#/E##EEEEEEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##AE##AEAE#AE#E#AEE##EAEEEEEAEEEEAEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEE#EEEE<EA#E/AA/##AEEEAEAEE/EEEEEEE
AAAAAAEEEEEEEEEEAE6EEEEEEEEEEEE#EEEAE/EE6/EEAEEEEEAE6AAAEAAEEEEEE6EE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EE/EEEAEEEEEEE/EE
AAAAAEEEEEEEEEEEEEEEEEEEEEAEEEE#EEE#EEEEEEE#E#EE<##EAEEEEEEEEEE<EEEE
AAAAAAEEAEE/EAAEEEEEEEAEEAEEEEE#EEE#AEE/EE<#A/E/6/6EE6EEEE//EEEEEE/<
AAAAAEEEE6AEEAEEEAEAEEE/EEAEE<E#<<<#AEEE///#/E/E/EEA<A/E/AEA/6AAA<<6
AAAAAEEEEAEEEEEEEEEEEEEEEEEEEAE#E<EEEA/EEEEEEEEA<EE/EEEA6EEAEEEEEEEE

Anyone have a possible explaining idea? And a possible solution without discarding these samples? (resequencing, increasing PhiX amount, performing the alignment with a broader mismatch...).

Thanks in advance!

RNA-Seq Illumina • 1.4k views

ADD COMMENT • link 5.8 years ago by garcesj ▴ 50

score 1 · Answer 1 · 2020-01-31

I guess you are posting the Q score's and not the actual sequence of the reads?

It sounds like there was an issue (likely hardware related, e.g. a bubble in lane) in cycle 32 which lead to the N's in that cycle. Your sequencing facility should not have released the data if that was the case. You could check with them to see if they would not mind re-running the samples. Illumina generally gives credit for lanes that fail because of hardware errors, if the facility has a maintenance contract (which they should, if they don't look for a different sequencing provider next time).

If you were doing some kind of odd libraries where the sequence became low nucleotide diversity after cycle 32 then increasing the phiX amount (coupled with deliberate under loading of libraries) would be one option.