Anybody has the idea why redundant @SQ lines present in bam file header?
I created the bam file by the following procedure:
bowtie-build the genome
create sam file using samtools by aligning fastq files to the bowtie-build output
convert sam to bam using samtools
Those redundant files making error "Cannot add sequence that already exists in SAMSequenceDictionary" while I am trying to add Read Groups using picard-AddOrReplaceReadGroups.
what you write does not quite makes sense, sam files are not created by samtools, and bowtie-build does not align data. Edit you your post and add the commands that you used and perhaps a sample of what you call redundant @SQ lines
The editor deleted the new lines between the steps. I did not notice that.
The steps are:
Build the bowtie index using bowtie-build for genome.
create sam file using bowtie(not samtools) (by aligning fastq files to the bowtie-build output)
convert sam to bam using samtools
The generated bam files contains duplicate @SQ lines in the header. I think I got the reason . One file used to build the bowtie index is the subset of another file.
ChrY.fa is a part of ChrU.fa.
what you write does not quite makes sense, sam files are not created by samtools, and bowtie-build does not align data. Edit you your post and add the commands that you used and perhaps a sample of what you call redundant @SQ lines