Hello, I am having sort of a nightmare in trying to format my bam files as required by GATK and have pretty much ran out of ideas. I will therefore appreciate some help.
I have followed the indications suggested to me here (http://www.broadinstitute.org/gatk/guide/article?id=1204). My bam files are sort-ordered and have read groups added, using Picard-Tools. I also have them indexed with either Picard-Tools, SAMTools or even BAMTools.
One of the problems I am facing is that while it is indicated that GATK only takes indexed bam files, it gives me the following error every time I input a *.bai
<< Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM files with the .bam extension and lists of BAM files with the .list extension, but the file /home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bai has neither extension. Please ensure that your BAM file or list of BAM files is in the correct format, update the extension, and try again.
>
I have checked my read groups and headers to make sure they look like the one specified in the GATK website (http://www.broadinstitute.org/gatk/guide/article?id=1204). Using a non-indexed, yet sort-ordered/readgroup-added bam file, I ran RealignerTargetCreator and I got the following error:
<< ERROR MESSAGE: SAM/BAM file SAMFileReader{/home/gp53/tophat2-eber-2nd-R1-readgroups-reorder.bam} is malformed: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK
>
My header looks like this:
@VN:1.0 SO:coordinate
@SQ SN:chrM LN:16571 UR:file:/home/gp53/bwa/genome.fa M5:d2ed829b8a1628d16cbeee88e88e39eb
@SQ SN:chr1 LN:249250621 UR:file:/home/gp53/bwa/genome.fa M5:1b22b98cdeb4a9304cb5d48026a85128
@SQ SN:chr2 LN:243199373 UR:file:/home/gp53/bwa/genome.fa M5:a0d9851da00400dec1098a9255ac712e
@SQ SN:chr3 LN:198022430 UR:file:/home/gp53/bwa/genome.fa M5:641e4338fa8d52a5b781bd2a2c08d3c3
@SQ SN:chr4 LN:191154276 UR:file:/home/gp53/bwa/genome.fa M5:23dccd106897542ad87d2765d28a19a1
@SQ SN:chr5 LN:180915260 UR:file:/home/gp53/bwa/genome.fa M5:0740173db9ffd264d728f32784845cd7
@SQ SN:chr6 LN:171115067 UR:file:/home/gp53/bwa/genome.fa M5:1d3a93a248d92a729ee764823acbbc6b
@SQ SN:chr7 LN:159138663 UR:file:/home/gp53/bwa/genome.fa M5:618366e953d6aaad97dbe4777c29375e
@SQ SN:chr8 LN:146364022 UR:file:/home/gp53/bwa/genome.fa M5:96f514a9929e410c6651697bded59aec
@SQ SN:chr9 LN:141213431 UR:file:/home/gp53/bwa/genome.fa M5:3e273117f15e0a400f01055d9f393768
@SQ SN:chr10 LN:135534747 UR:file:/home/gp53/bwa/genome.fa M5:988c28e000e84c26d552359af1ea2e1d
@SQ SN:chr11 LN:135006516 UR:file:/home/gp53/bwa/genome.fa M5:98c59049a2df285c76ffb1c6db8f8b96
@SQ SN:chr12 LN:133851895 UR:file:/home/gp53/bwa/genome.fa M5:51851ac0e1a115847ad36449b0015864
@SQ SN:chr13 LN:115169878 UR:file:/home/gp53/bwa/genome.fa M5:283f8d7892baa81b510a015719ca7b0b
@SQ SN:chr14 LN:107349540 UR:file:/home/gp53/bwa/genome.fa M5:98f3cae32b2a2e9524bc19813927542e
@SQ SN:chr15 LN:102531392 UR:file:/home/gp53/bwa/genome.fa M5:e5645a794a8238215b2cd77acb95a078
@SQ SN:chr16 LN:90354753 UR:file:/home/gp53/bwa/genome.fa M5:fc9b1a7b42b97a864f56b348b06095e6
@SQ SN:chr17 LN:81195210 UR:file:/home/gp53/bwa/genome.fa M5:351f64d4f4f9ddd45b35336ad97aa6de
@SQ SN:chr18 LN:78077248 UR:file:/home/gp53/bwa/genome.fa M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c
@SQ SN:chr19 LN:59128983 UR:file:/home/gp53/bwa/genome.fa M5:1aacd71f30db8e561810913e0b72636d
@SQ SN:chr20 LN:63025520 UR:file:/home/gp53/bwa/genome.fa M5:0dec9660ec1efaaf33281c0d5ea2560f
@SQ SN:chr21 LN:48129895 UR:file:/home/gp53/bwa/genome.fa M5:2979a6085bfe28e3ad6f552f361ed74d
@SQ SN:chr22 LN:51304566 UR:file:/home/gp53/bwa/genome.fa M5:a718acaa6135fdca8357d5bfe94211dd
@SQ SN:chrX LN:155270560 UR:file:/home/gp53/bwa/genome.fa M5:7e0e2e580297b7764e31dbc80c2540dd
@SQ SN:chrY LN:59373566 UR:file:/home/gp53/bwa/genome.fa M5:1e86411d73e6f00a10590f976be01623
@RG ID:null PL:illumina PU:single_lane LB:unstranded SM:tophat-eber-2nd-R1
@PG ID:TopHat VN:2.0.5 CL:/usr/local/bin/tophat2 -p 16 -g 1 -z pigz -G /home/gp53/tophat/genes.gtf --no-novel-juncs -o tophat-eber-2nd-R1 /home/administrator/Bowtie2Index/genome /media/Elements/Genaro/input/eber-2nd-R1.fastq
I would appreciate your help on this. G.
Thanks so much. I have the RealignerTargetCreator running now in both BWA and TopHat2 alignments.The one thing I changed is to leave the ID=string option as default=1 in AddOrReplaceReadGroups.jar.
That pretty much eliminated the recurring error: Read HWI-ST830:129:D1459ACXX:8:1208:6666:45578 is either missing the read group or its read group is not defined in the BAM header, both of which are required by the GATK