Multiple N bases in RNAseq
1
0
Entering edit mode
4.8 years ago
garcesj ▴ 50

Hi there,

I've recently sequenced some samples following a 3'-end RNA-seq protocol. When I watched FASTQs, I realized that multiple reads have Ns (in allmost all processed samples). Initially, I thought in the sample's quality (because I'm using very low amount of cells as input) or in a low diversity because not enough concentration of PhiX... but it's curious because all reads with Ns always contain base number 32 (and those with more Ns, its surrounding bases), here you have an example (only qualities, Ns are #s):

AA6A/AEA/EEEEEAEEEEEEAE#/####E################/#A##E/##E###EEAAE/6E/
AAAAAEEEEEEEEEAEAEEEEEE#E####A################E#/##EE##EE##E/AEEEEE/
/AAAAEEEEEEEEEEEEEEAAAE#EE###E################/#E##EE##/<##EAEE<A/EA
AAAAAEEEEEEEEEEEEEEEEEE#EE###E############A#<#E#E##EEE#EE##<EEEEEEE/
AAAAAEEEEEEEEEEEEEEE/EE#EE###E###E########E#E#E#E##EEEEEE##EEAEEEEAE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EEEEEEEEEEE<AEEEE
AAAAAEEAEEEEEEEEEEEEEEEEAEE#EEE#A/E#EEEE#E<#E#EEE##EEEE<AEE/EEEAEEEE
AAAAAEEEEEEEEEEEEEEE6EEEEEE#EEE#EEE#EEEE#6A#E#A/E##EEEA/EEA/EAEA//A/
AA6AAEE/AAEE/A/EAEEEEEEEEEE#EEE#EE/#EAEA#</#/#A<E##EE/E/E/EEEEEE/</<
AAAAAEEEAEEEEEEEEEEEAEEAEEA/EEE#EAEEEAEAEEEEAEEE/EAA6AAAEA<EEEEA<AEE
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEEEE##EEEEEEA<<EEEAEE/EEEE<AAEAEEEE
AAAAAEEEAEEEEEEEEEEEEEEEAEEEEEE#EEEEE##EEEEEEEEEEEEEEEEEEEEEEEEEEAEE
6AAAAE/EEEEEEEEEEEEEEEE#EE###E############E#A#A#E##EEE#/E##EEEEEEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##AE##AEAE#AE#E#AEE##EAEEEEEAEEEEAEEE/
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEE#EEE#EEEE<EA#E/AA/##AEEEAEAEE/EEEEEEE
AAAAAAEEEEEEEEEEAE6EEEEEEEEEEEE#EEEAE/EE6/EEAEEEEEAE6AAAEAAEEEEEE6EE
AAAAAEEEEEEEEEEEEEEEEEEEEE##EE##EE##E#EE#EE#E#E#E##EE/EEEAEEEEEEE/EE
AAAAAEEEEEEEEEEEEEEEEEEEEEAEEEE#EEE#EEEEEEE#E#EE<##EAEEEEEEEEEE<EEEE
AAAAAAEEAEE/EAAEEEEEEEAEEAEEEEE#EEE#AEE/EE<#A/E/6/6EE6EEEE//EEEEEE/<
AAAAAEEEE6AEEAEEEAEAEEE/EEAEE<E#<<<#AEEE///#/E/E/EEA<A/E/AEA/6AAA<<6
AAAAAEEEEAEEEEEEEEEEEEEEEEEEEAE#E<EEEA/EEEEEEEEA<EE/EEEA6EEAEEEEEEEE

Anyone have a possible explaining idea? And a possible solution without discarding these samples? (resequencing, increasing PhiX amount, performing the alignment with a broader mismatch...).

Thanks in advance!

RNA-Seq Illumina • 954 views
ADD COMMENT
1
Entering edit mode
4.8 years ago
GenoMax 147k

I guess you are posting the Q score's and not the actual sequence of the reads?

It sounds like there was an issue (likely hardware related, e.g. a bubble in lane) in cycle 32 which lead to the N's in that cycle. Your sequencing facility should not have released the data if that was the case. You could check with them to see if they would not mind re-running the samples. Illumina generally gives credit for lanes that fail because of hardware errors, if the facility has a maintenance contract (which they should, if they don't look for a different sequencing provider next time).

If you were doing some kind of odd libraries where the sequence became low nucleotide diversity after cycle 32 then increasing the phiX amount (coupled with deliberate under loading of libraries) would be one option.

ADD COMMENT
0
Entering edit mode

Thanks for your quick reply, @genomax. (yes, indeed, I'd want to say QC in the code I showed, already edited). I've reviewed all Illumina QCs in the sequencer and "Per tile sequence quality" and there's nothing related (in appearance) with the run's quality... I forgot to mention that is a generalized error for all my samples, not only one punctual case.

ADD REPLY

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6