Problems running variant calling (VC) GATK: SAM/BAM/CRAM file dedup.bam is malformed
1
0
Entering edit mode
6.2 years ago
Lila M ★ 1.3k

Hi everybody, I'm trying to carry out some VC analysis, but I'm not very advanced at it, so I am a bit struggle... For my RNA-seq samples, I've run the following command lines:

#1 Mark duplicates
ls *bam | xargs -I {} -n 1 java -jar picard.jar MarkDuplicates I={} O={}.dedup  M={}.metrix.txt

#2 Index file
ls *bam | xargs -I {} -n 1 java -jar picard.jar BuildBamIndex  I={} O={}.bam.bai

#3  Split'N'Trim and reassign mapping qualities
ls *bam | xargs -I {} -n 1 java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R GRCh38.p10.genome.fa -I {} -o {}.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

But then, I get the following error:

##### ERROR MESSAGE: SAM/BAM/CRAM dedup.bam is malformed. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1317for more information. Error details: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups
##### ERROR ------------------------------------------------------------------------------------------

Any ideas about how to solve it? Thanks!

GATK VC RNA-Seq • 3.6k views
ADD COMMENT
1
Entering edit mode

Have a look at one of the files, do you see anything obviously wrong with it?

ADD REPLY
0
Entering edit mode

Not at all with

samtools view -H file.bam
@HD VN:1.4  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555

but nothing comes up with:

samtools view -H file.bam | grep '^@RG'

does it means that my BAM file doesn't have read group and sample information?

ADD REPLY
1
Entering edit mode

does it means that my BAM file doesn't have read group and sample information?

yes

ADD REPLY
0
Entering edit mode

PAA* because they make it harder to read your post. In this case, it's probably "variant calling" what you are talking about. While VC may be a CUA** for you, this is not not necessarily the case for the rest of the users here.

* Please Avoid Abbreviations
** Commonly Used Abbreviation

ADD REPLY
3
Entering edit mode
6.2 years ago

Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

Your sam file doesn't have read groups. This is a common issue with GATK, as the program now wants the reads to be grouped for analysis purposes.

You can assign them manually, or you can use Picard's tool AddOrReplaceReadGroups. Third possibility, if this tool of Picard doesn't work, is to make your own script to perform this task, as I had to do (this is mine: https://github.com/MatteoSchiavinato/Utilities/blob/master/add-read-groups.py ).

ADD COMMENT
0
Entering edit mode

And how can I get this info?

ADD REPLY
2
Entering edit mode

Make stuff up, the IDs and such are arbitrary.

ADD REPLY
2
Entering edit mode

Or ask whoever prepared/ran the samples for real info.This information will be embedded in your files. If you make stuff up it will be difficult to make sense of it in a few days or if someone else needs to look at this data.

ADD REPLY
0
Entering edit mode

Both Devon and Genomax are right, because - at least my personal experience - it depends on your analysis.

Are you running an analysis composed of different replicates/accessions/genomes/species, or are the data coming from one source and one replicate?

  • Many sources: add read groups that make sense (in that case my script might not work as you desire)
  • One source: make stuff up, like using the sample name and that's it (my script will likely do the job - not 100% guaranteed, cause it worked with my data but haven't tested it extensively)
ADD REPLY

Login before adding your answer.

Traffic: 2192 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6