Entering edit mode
2.8 years ago
Apex92
▴
320
Dear all,
I have noticed that a small proportion of my reads are misassigned in fastq files and now that I do not have bcl files or index information - is it possible to fix this misassignment?
The library protocol is QIAseq low miRNA input.
The header of my fastq files:
@VH00203:12:AAANKJFM5:1:1101:61404:1038 1:N:0:TCCTCGGA
TAGCCGGCTGAACTGTAGGCACCATCAAGTCCACCCCGGACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGTTTTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCC-CCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCC-C-------C--CC
@VH00203:12:AAANKJFM5:1:1101:61480:1038 1:N:0:TCCTCGGA
ACCCGTAAACTGTAGGCACCATCAATACCCACCTAAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGATCTCGTATTCCCTCTGTTGCG
+
CCCCCCCC;CCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCC-C--CC-C-;-C;-CC-C
@VH00203:12:AAANKJFM5:1:1101:61897:1038 1:N:0:TCCTCGGA
CCCTCCGAACTGTAGGCACCATCAATCGGCCCTGTTTAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGGTTTTCGTCGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCC;CCCC-----;C;--;-
@VH00203:12:AAANKJFM5:1:1101:62162:1038 1:N:0:TCATCGGA
TCCGTGAACTGTAGGCACCATCAATTTGTGAAAGTCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCATCGGAATCTCGTTTTTCGTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CC--C;-----C;C;-;;
@VH00203:12:AAANKJFM5:1:1101:62692:1038 1:N:0:TCATCGGA
GAGGAACTGTAGGCACCATCAATGTATAACGGGCTAGATCGGAAGAGCACACGTCTAAACTCCAGTCACTCATCGGCATCTCGGTTTTCCTCTTTGTTGTG
+
CCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC--;---------;---CC
Many thanks.
I suppose you refer to the
-
character as representing a misalignment.That is not what the
-
character means, that's simply a quality measure of the FASTQ format.But I agree it is indeed odd and can look very confusing that most of the qualities are either a
C
or a-
just remember those are not bases, those are quality values as described here:
https://en.wikipedia.org/wiki/FASTQ_format
Not actually. By pasting some of the lines of my fastq files I just wanted to give an impression of how they look or maybe I can retrieve index information from the headers. I am aware that
-
sign belongs to quality scores but after doing DEG analysis I found contamination (read alignment) in samples that should not be contaminated meaning they are control samples. Thus with this DEG result, I suspected that there is read misassignment and some reads of contaminated samples ended up in the control samples. I hope this clarifies the problem more.ok, it is important to state the problem in a way that cannot be misinterpreted,
technically there is no such thing as misalignment, the read always aligns to the most similar region though what we usually mean by misalignment is that the most similar region in the genome is not the one it originated from ... for whatever reason
here the solution is to look at the actual alignments (the BAM fields), post some of these alignments for those reads that you believe are not correctly assigned, then state why you think these are misalignments,
few people if any can just identify potential misalignment from a fastq sequence alone
You should be extremely cautious when throwing away data just because you don't think it belongs. Do you think there is leakage in the demultiplexing? Or that the people at the bench cross contaminated samples? (Obviously your sequencing index is retrievable from the fastq, it's right there.)