Hi,
I am trying to repair a .fq file using bbMap repair.sh command and I am running into an error. My .fq file is pair-end but is not sorted or interleaved, hence, needing to repair it before moving on to my next analysis. When I run repair.sh it starts and then always errors out at the same spot, Line 87: 23354 Killed I have changed the amount of memory allocated (-Xmx) as I thought I might be running into memory issues but this is not fixing the problem.
I am not sure what this error is pertaining to. I looked at my .fq on line 87 and saw no errors. Pasted below is lines 85-88 from my .fq.
@38_1_1101_31566_1016/1
CTGCAGGATCCTTCTCTGGGTTTCCCACCCCGTCCTCCTGGAATTTCACCACTTTCCTCCTGCCCAGCTGATGAGCTGATCCCACAGCAGATTCGGGGCTCCCTGATCCTGGAATTTTCTCCTTGTTTTTCTCAGGTTTT
+
FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFF::FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFF,FFF,,FFFFFFFFFFFF
I do not see any errors with the file, so I am thinking this may be an error somewhere with how I am running the command.
Any help would be appreciated in figuring out what the error means and how I can get bbmap to repair this file.
Thank you!
I want to see how you are running
repair.sh
. Can you post the entire command?Hi, I am running on local machine. Here is the command I am entering:
I suggest you add
fint=t
to the command and try. You may need more than 16G of RAM depending on how large your input file is.If you reads don't have corresponding
/2
designation indicating read 2 then you may also need to useain=t
.Where did you get this file from? Show us four reads from the file. If this file does not contain interleaved data then you can't use
repair.sh
.Update to my problem: interleaving was broken I think. I now have bbMap working, or so it appears, I use this command instead:
Our outputs files looked like they were repaired, here are a few lines from the sorted R1 and R2.
R1 file:
R2 file:
Our new problem is when trying to align against the reference genome using
bwa mem
none of the reads are recognized as pair-end. I am not sure if this is an issue with the bbMap output or something new with running bwa. Do you think this is a bbMap output issue and that something did not work when repairing the .fastq?This likely has nothing to do
bbmap.
You reads have old format illumina fastq identifiers (note the/1
and/2
in headers, that was the reason I asked you where you got this data from) whichbwa mem
may not be recognizing.I think the OP needs to post the exact error, I believe
bwa mem
does recognize and ignore the/1
and/2
at the ends. Even the simulatorwgsim
used to evaluatebwa
has those markers.What is the exact error that you get?
Let me get the error from bwa and will repost for you. Thank you
Here is a subset of the output from bwa. I only copied several of the lines because there are over 9000 lines of the same thing. When I use samtools to check the stats of the .bam it shows there are no paired reads.
and I did not add the error.. so sorry.
Those are not errors. It is normal to see those messages.
exactly these are progress reports on how the alignment is progressing, it even gives you estimates on the fragment length mean and standard deviations, thus evidently, it recognizes the data as paired
I have never had every read come back as skipping FF, RR, and RF orientation as not enough pairs being found and then having zero matched pairs being found by the end of the alignment. I realize that output is just the log of the alignment progressing, but generally my log does not show not enough pairs being found for everything. That to me would be an error if my input .fqs all seem to be matched and the stats of the .bam has no paired reads.
if the input were not paried bwa would stop and print an error explicitly stating which read pair did not match.
The messages that state something about pairs being skipped are not errors, those are also a log messages that tell you that specific orientation is missing, not that the reads are not paired. Usually, you would not want FF or RF orientation to be present unless the data was of that type.
it is true that the messages are highly confusing, as they seem to suggest there aren't enough pairs, where what it says that there are no FF pairs. In fact there shouldn't be any FF, RF pairs if the data is FR
if it does not stop with that error it means the data is all paired.
What does your
samtools flagstat bamfile
say about the alignments?Thank you. This is NovaSeq data that we just got back from the sequencing facility and we had no problem with older HiSeq data so I just used the same methods as we used with the HiSeq data and now we have all these issues. At least I now have bbmap figured out and will move on to trying to figure out my bwa problem.
That is interesting. It looks like the facility is actively changing the headers produced by illumina software.