Hello everyone,
I am trying to run a workflow from a complete genome .bam file all the way to the consensus of a fasta sequence from the .vcf file constructed through the process.
After sorting, indexing, and retreiving a .bam file with the genic strand i wish to work with, i need to run
samtools mpileup
As i run the following command:
samtools mpileup -g -f MyFile.bam > MyFile.bcf
I get the following error:
[group_smpl] Read group SRC-5-alex4__MA167snp used in file MyFile.bam but absent from the header or an alignment missing read group.
Anyone who could help me figure out how to output a .bcf file successfuly?, I have run this command on docens of other genomes, and it's the first time i get an error. Any help would be much appreciated.
Hey there Devon.
That makes sense (of course, since you know what you are talking about), i visually checked my used bam files, and the problem is after the alignment, say, all of the ones who work well have 10 information columns after the alignment section, these columns are the next ones in a troubleless file:
As i checked the file with the trouble, i realized it is lacking the colum with the "XP:i:1" tag. As you said, it is strange, since i downloaded the file directly from the source, and i just sorted, indexed, and retrieved a strand, i really don't think the problem was with my processing protocol, but it is still a good question to have in mind, on which preprocessing step this happened.
I'll run the code you suggest, i just have one question, do the reheader option fixes automatically the lacking information in the bam file?, or do i somehow have to add the column manually? .. if so, i am thinking something with
or maybe
, which would be a bit more challenging for me.
Thank you for the feedback
If my guess is correct then the error is just a missing line in the header. You'll have to manually add a line into the header in a text editor, so no awk or sed needed. This is just my expectation as to the fix though. If this is correct, then the reheadering process should resolve all issues.
Hey, i have tried the code you provide, and i see what you mean .. in the middle of those commands, i need to add in the header the information "SRC-5-alex4__MA167snp" into the header, before reheadering the file, so all the read groups have the tag and can be piled up by the "mpileup" command.
Any suggestion on a command to add that information into the header? .. i have been browsing an some people suggest using " picard ", with the "AddOrReplaceReadGroups" option. Is that correct?, it is the first time i will be formating a .bam header and i wish to be sure that my analysis will be correct after adding info to the file's header.
cheers,
Ricardo
Open it with your preferred text editor and type the line in by hand. That's the quickest and easiest method.