Entering edit mode
4.1 years ago
tanukitries
•
0
I'm new to the world of RNA sequencing analysis. I'm attempting to run qc on a couple of RNA sequencing runs. These were performed on an Illumina platform (NextSeq500). The files that were returned to us were bam files.
When I attempted to run fastqc (0.11.5) I immediately got an error:
Failed to process bamfile.bam
java.lang.IllegalArgumentException: Cannot encode phred score: 239
at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:369)
at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:357)
at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:343)
at net.sf.samtools.SAMRecord.getBaseQualityString(SAMRecord.java:248)
at uk.ac.babraham.FastQC.Sequence.BAMFile.readNext(BAMFile.java:144)
at uk.ac.babraham.FastQC.Sequence.BAMFile.<init>(BAMFile.java:65)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:100)
at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:129)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:102)
at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
So there are some out of range phred scores. I don't know how to fix this. Any suggestions as to what I should do next would be welcomed, thank you!
Are these unaligned BAM files?
These are unaligned!**
**Actually, just kidding? I used samtools to try to get some information about the bam file:
This is not an unaligned file. It seems to be aligned against phiX. So reads of interest should be unaligned. BTW: Does
unmapped_barcodes
in file name have any meaning?Can you convert the BAM file back to fastq (use
samtools fastq
, check its options carefully) and then runFastQC
on the reads.Thank you. I heard back from the core and was told that these unmapped_barcodes represents the reads from the illumina control that basically don't match barcodes of my actual samples. I have two bam files, these were generated from a paired end sequencing run.
I sorted the bam files (was told they were complete) and converted them back to fastq files independently and attempted to run fastqc again on each fastq file.
I'm getting a new error this time:
Same error for the other fastq file.
There are many different types of bam files. Could you inform us about if it is sorted, aligned or sth else?
Personally, I would like to qc fastq files instead of bam because it can be manipulated if needed. So, I would recommend converting bam to fastq and then running fastqc to see the results.
I sorted the bam files (paired end run) and converted them to fastq and attempted to fun fastqc again.
I'm getting another error that looks related to the phred scores:
I looked at the first few lines of the fastq file:
That is odd. These are very poor Q scores (< 5) but they are valid sanger format quality scores.
Can you run the following program from BBMap suite and post the result here. Normally you should see something like what is noted below.
This is what I get
How did you convert the files to fastq? Like this?
I sorted the bam file like this:
I converted the resulting sorted bam using bedtools:
Thank you for your time and help!
You need to name sort your BAM files before converting them to fastq. Can you try my command above? It may not make a difference but worth a try. There may be some strange characters in your file. Are you using a non-english system locale?
I am not. I used your command above, sorting with samtools and using that to convert the resulting file to a fastq file. I have two bam files for this paired end sequencing run and I performed this sort and conversion on both files separately. I ran fastqc on them I am getting different errors for the two files:
At this point you can try a fastq validation program like
fq lint
(LINK).There is also
validateFiles
from Jim Kent (C: Viewing and editing FASTQ files )Both programs should identify those records which are corrupt. You will likely need to delete those. Curiously this error must be present in your BAM files since it is getting carried forward. You can check on that.
Thank you for all your help. I'll try this.