Entering edit mode
4.1 years ago
dp
▴
50
It seems that the read simulator from BBTools randomreads.sh) creates reads with names that are too long for samtools. Has anyone else run into this? How can I get around it?
Can you post your command and example of read headers?
Simple reads should be like
Here's an example:
@SYN_0_24445_24594_23974_+339279274_1.NZ{CP029979.1$Escherichia$coli$strain$99-3165$plasmid$unnamed1&0_621_770_23974-339255450_1._NZ{CP029979.1$Escherichia$coli$strain$99-3165$plasmid$unnamed1 1:
This one isn't over the length limit (252 I believe), but there seems to be one that is and that causes samtools to error out in the middle.
I'm not sure what the command used was - I'm trying to help out a user and realised that this was the problem, is there an option in the simulator to suppress these long names?
This is likely because the fasta input file had an overly long header. Additional issue must be the
$
and{
in the names (not sure what they are there) whichsamtools
likely does not like. I think you are best off regenerating these reads after modifying the fasta header to something acceptable.If that is not possible then you could chop the remainder of fastq header off after
@SYN_0_24445_24594_23974_+339279274_1
. I think the first part should be unique for all reads but you can confirm.OK thanks. If I understood correctly this is coming from the header of the reference sequence that the reads are simulated from? Do you know about what comes after the
&
sign?How would you suggest cutting the headers - just cut every header after the first
.
?Is there a way to get the header to just be a number as in your example above - is there a specific flag to pass to the simulator to get this behaviour? I assume the current format with all the numbers etc is to keep track of where each read came from?
Thanks again
If you just want them to be numbers you can simply use
illuminanames=t
.