Entering edit mode
4.9 years ago
User 4014
▴
40
Hi all,
I have a problem with header formats. Since Trinity needs all of the headers to be stitched together, I have some with whitespaces in - that look like this:
@A00700:80:HHHNGDRXX:1:2101:30481:11663#GTGCACCAGGAATCAC 0:N: 00 /1
Can anyone help me with some tips on how to fix these with a bash script or something?
Thank you very much in advance.
That looks like a hybrid of old and new Illumina fastq headers. How did you end up with those?
Actually it is from NovaSeq. The original headers end at #GTGCACCAGGAATCAC, but I guess the 0:N: 00 is added by STAR and /1 is by reformat.sh.
Neither makes sense. Can you tell us what you used
reformat.sh
for?Yes, I used it to add /1 and /2 flags. I have mixed RNA-Seq data, but somehow I managed to remove the flags during rRNA clean-up (with both bowtie2 and bbduk) and binning to separate mixed reads from a plant and a fungus (with STAR). Do you have a suggestion?