Entering edit mode
5.3 years ago
noodle
▴
590
I have a WGS dataset I'd like to align with bwa but the fastq files are in DNBseq format (shown below) and I'm stuck on the syntax. Does anyone how to change the RGline “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE”
part of BWA mem -aMp -t #ofCPUs ref.fa -R “@RG\tID:**\tLB:**\tPL:ILLUMINA\tSM:**\tPU:BARCODE” > output.sam
to make these work? Thanks.
Here's the top line of a fastq file in DNBseq format
@V300020594L1C001R0010000024/1
*Edited for clarity.
You are referring to
-R
option which is for read groups.If you don't have multiple samples (or don't plan to use a program that requires read groups) you should be able to omit that option. Note: If your downstream analysis requires read groups then you would need to use read groups.
Can you show us original fastq headers in your data?
I have multiple paired-end samples, which is why I was hoping to use -R. This is the header I was provided from the NGS company (BGI), which used DNBseq machines. The only documentation I could find from them shows the below.
FASTQ file sequenced by DNBseq.
I'll try to parse in some colon delimiters and see if it helps(like this).
Thanks.
Take a look at this page to see if you can use some of the examples to construct an appropriate
-R
line for your samples.Something like
This bioRxiv paper says that an accompanying repo has code for converting BGI formatted fastq headers to Illumina format. But I am not able to find a link for the GitHub repo. Take a look around to see if you can find it.
@joe, I got Nebula (BGI/DNBseq) WGS results. I'm new to this. Can you please share your BWA MEM command parameters for alignment with DNBseq data that you used?
DNBseq data should be no different than any other sequence data as in being sanger encoded fastq sequence. Use standard options to start the analysis.