Question

How to get rid of mis-assigned reads without having bcl files or index information?

0

Entering edit mode

2.8 years ago

Apex92 ▴ 320

Dear all,

I have noticed that a small proportion of my reads are misassigned in fastq files and now that I do not have bcl files or index information - is it possible to fix this misassignment?

The library protocol is QIAseq low miRNA input.

The header of my fastq files:

@VH00203:12:AAANKJFM5:1:1101:61404:1038 1:N:0:TCCTCGGA
TAGCCGGCTGAACTGTAGGCACCATCAAGTCCACCCCGGACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGTTTTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCC-CCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCC-C-------C--CC
@VH00203:12:AAANKJFM5:1:1101:61480:1038 1:N:0:TCCTCGGA
ACCCGTAAACTGTAGGCACCATCAATACCCACCTAAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGATCTCGTATTCCCTCTGTTGCG
+
CCCCCCCC;CCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCC-C--CC-C-;-C;-CC-C
@VH00203:12:AAANKJFM5:1:1101:61897:1038 1:N:0:TCCTCGGA
CCCTCCGAACTGTAGGCACCATCAATCGGCCCTGTTTAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGGTTTTCGTCGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCC;CCCC-----;C;--;-
@VH00203:12:AAANKJFM5:1:1101:62162:1038 1:N:0:TCATCGGA
TCCGTGAACTGTAGGCACCATCAATTTGTGAAAGTCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGTTTTTCGTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CC--C;-----C;C;-;;
@VH00203:12:AAANKJFM5:1:1101:62692:1038 1:N:0:TCATCGGA
GAGGAACTGTAGGCACCATCAATGTATAACGGGCTAGATCGGAAGAGCACACGTCTAAACTCCAGTCACTCATCGGCATCTCGGTTTTCCTCTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC--;---------;---CC

Many thanks.

alignment fastq RNA-seq • 1.0k views

ADD COMMENT • link updated 2.8 years ago by swbarnes2 14k • written 2.8 years ago by Apex92 ▴ 320

0

Entering edit mode

I suppose you refer to the - character as representing a misalignment.

That is not what the - character means, that's simply a quality measure of the FASTQ format.

But I agree it is indeed odd and can look very confusing that most of the qualities are either a C or a -

just remember those are not bases, those are quality values as described here:

https://en.wikipedia.org/wiki/FASTQ_format

ADD REPLY • link 2.8 years ago by Istvan Albert 102k

0

Entering edit mode

Not actually. By pasting some of the lines of my fastq files I just wanted to give an impression of how they look or maybe I can retrieve index information from the headers. I am aware that - sign belongs to quality scores but after doing DEG analysis I found contamination (read alignment) in samples that should not be contaminated meaning they are control samples. Thus with this DEG result, I suspected that there is read misassignment and some reads of contaminated samples ended up in the control samples. I hope this clarifies the problem more.

ADD REPLY • link 2.8 years ago by Apex92 ▴ 320

2

Entering edit mode

ok, it is important to state the problem in a way that cannot be misinterpreted,

technically there is no such thing as misalignment, the read always aligns to the most similar region though what we usually mean by misalignment is that the most similar region in the genome is not the one it originated from ... for whatever reason

here the solution is to look at the actual alignments (the BAM fields), post some of these alignments for those reads that you believe are not correctly assigned, then state why you think these are misalignments,

few people if any can just identify potential misalignment from a fastq sequence alone

ADD REPLY • link 2.8 years ago by Istvan Albert 102k

0

Entering edit mode

You should be extremely cautious when throwing away data just because you don't think it belongs. Do you think there is leakage in the demultiplexing? Or that the people at the bench cross contaminated samples? (Obviously your sequencing index is retrievable from the fastq, it's right there.)

ADD REPLY • link 2.8 years ago by swbarnes2 14k