Question

Output whole header line in fasta file using bwa

0

Entering edit mode

9.1 years ago

erikras1223 ▴ 10

Hello, I am trying to save the complete header lines in my Fasta file using BWA. Once I've mapped the reads to the references genome and I want to extract the ones that mapped and output them to a fasta file. I need the reads to have the complete header name they originally had.

After looking for a while I see the option: bwa mem -R ’@RG\tID:foo\tSM:bar’. The problem is I don't understand this string i need to input and I get an error every time I try to use it. I know the above string is just an example, but I would be very grateful if some could explain this. Or propose a different way to output the complete header line for the reads from bwa. Thanks

header BWA Complete Whole Line mem • 3.3k views

ADD COMMENT • link 9.1 years ago by erikras1223 ▴ 10

0

Entering edit mode

I'm a bit confused on what you're trying to do and why. Are starting with a fasta file and you want to end up with a fasta file containing only the reads that map to the reference? What are you using readgroups for? Are the read headers important to keep unchanged, or are you just trying to use them for extracting reads?

ADD REPLY • link 9.1 years ago by Brian Bushnell 20k

0

Entering edit mode

I assume you want to save the entire fasta header (which has spaces in the name)? If that is the case you would need to convert those spaces to "_" and make the header a long string. Fasta format specification ignores anything that follows the first space in the header (which is how bwa is treating it, my guess).

ADD REPLY • link 9.1 years ago by GenoMax 152k

0

Entering edit mode

Yes, this is exactly my question. I just want to be able to save the whole line of the header, but bwa is chopping some of the info off. I am later doing a search with the original header line to match against the bwa reads produced and they don't match.

ADD REPLY • link 9.1 years ago by erikras1223 ▴ 10

1

Entering edit mode

Note that the default behavior of BBMap is to NOT chop off header after the first whitespace, and it can directly output to fasta, like this:

bbmap.sh in=sequences.fasta outm=mapped.fasta outu=unmapped.fasta ref=reference.fasta

ADD REPLY • link 9.1 years ago by Brian Bushnell 20k

1

Entering edit mode

Either use BBMap or convert the spaces in the names to "_" like I said before, if you want to keep using bwa.

ADD REPLY • link 9.1 years ago by GenoMax 152k