Hi everybody, I'm trying to carry out some VC analysis, but I'm not very advanced at it, so I am a bit struggle... For my RNA-seq samples, I've run the following command lines:
#1 Mark duplicates
ls *bam | xargs -I {} -n 1 java -jar picard.jar MarkDuplicates I={} O={}.dedup M={}.metrix.txt
#2 Index file
ls *bam | xargs -I {} -n 1 java -jar picard.jar BuildBamIndex I={} O={}.bam.bai
#3 Split'N'Trim and reassign mapping qualities
ls *bam | xargs -I {} -n 1 java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R GRCh38.p10.genome.fa -I {} -o {}.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS
But then, I get the following error:
##### ERROR MESSAGE: SAM/BAM/CRAM dedup.bam is malformed. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1317for more information. Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
##### ERROR ------------------------------------------------------------------------------------------
Any ideas about how to solve it? Thanks!
Have a look at one of the files, do you see anything obviously wrong with it?
Not at all with
but nothing comes up with:
does it means that my BAM file doesn't have read group and sample information?
yes
PAA* because they make it harder to read your post. In this case, it's probably "variant calling" what you are talking about. While VC may be a CUA** for you, this is not not necessarily the case for the rest of the users here.
* Please Avoid Abbreviations
** Commonly Used Abbreviation