Hi there,
I've recently sequenced some samples following a 3'-end RNA-seq protocol. When I watched FASTQs, I realized that multiple reads have Ns (in allmost all processed samples). Initially, I thought in the sample's quality (because I'm using very low amount of cells as input) or in a low diversity because not enough concentration of PhiX... but it's curious because all reads with Ns always contain base number 32 (and those with more Ns, its surrounding bases), here you have an example (only qualities, Ns are #s):
AA6A/AEA/EEEEEAEEEEEEAE#/####E################/#A##E/##E###EEAAE/6E/
AAAAAEEEEEEEEEAEAEEEEEE#E####A################E#/##EE##EE##E/AEEEEE/
/AAAAEEEEEEEEEEEEEEAAAE#EE###E################/#E##EE##/<##EAEE<A/EA
AAAAAEEEEEEEEEEEEEEEEEE#EE###E############A#<#E#E##EEE#EE##<EEEEEEE/
AAAAAEEEEEEEEEEEEEEE/EE#EE###E###E########E#E#E#E##EEEEEE##EEAEEEEAE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EEEEEEEEEEE<AEEEE
AAAAAEEAEEEEEEEEEEEEEEEEAEE#EEE#A/E#EEEE#E<#E#EEE##EEEE<AEE/EEEAEEEE
AAAAAEEEEEEEEEEEEEEE6EEEEEE#EEE#EEE#EEEE#6A#E#A/E##EEEA/EEA/EAEA//A/
AA6AAEE/AAEE/A/EAEEEEEEEEEE#EEE#EE/#EAEA#</#/#A<E##EE/E/E/EEEEEE/</<
AAAAAEEEAEEEEEEEEEEEAEEAEEA/EEE#EAEEEAEAEEEEAEEE/EAA6AAAEA<EEEEA<AEE
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEEEE##EEEEEEA<<EEEAEE/EEEE<AAEAEEEE
AAAAAEEEAEEEEEEEEEEEEEEEAEEEEEE#EEEEE##EEEEEEEEEEEEEEEEEEEEEEEEEEAEE
6AAAAE/EEEEEEEEEEEEEEEE#EE###E############E#A#A#E##EEE#/E##EEEEEEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##AE##AEAE#AE#E#AEE##EAEEEEEAEEEEAEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEE#EEEE<EA#E/AA/##AEEEAEAEE/EEEEEEE
AAAAAAEEEEEEEEEEAE6EEEEEEEEEEEE#EEEAE/EE6/EEAEEEEEAE6AAAEAAEEEEEE6EE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EE/EEEAEEEEEEE/EE
AAAAAEEEEEEEEEEEEEEEEEEEEEAEEEE#EEE#EEEEEEE#E#EE<##EAEEEEEEEEEE<EEEE
AAAAAAEEAEE/EAAEEEEEEEAEEAEEEEE#EEE#AEE/EE<#A/E/6/6EE6EEEE//EEEEEE/<
AAAAAEEEE6AEEAEEEAEAEEE/EEAEE<E#<<<#AEEE///#/E/E/EEA<A/E/AEA/6AAA<<6
AAAAAEEEEAEEEEEEEEEEEEEEEEEEEAE#E<EEEA/EEEEEEEEA<EE/EEEA6EEAEEEEEEEE
Anyone have a possible explaining idea? And a possible solution without discarding these samples? (resequencing, increasing PhiX amount, performing the alignment with a broader mismatch...).
Thanks in advance!
Thanks for your quick reply, @genomax. (yes, indeed, I'd want to say QC in the code I showed, already edited). I've reviewed all Illumina QCs in the sequencer and "Per tile sequence quality" and there's nothing related (in appearance) with the run's quality... I forgot to mention that is a generalized error for all my samples, not only one punctual case.