Hi
It's not the first time I run NEXT-seq pipeline on fastq files. It works for me several times and I never got a problem. But it's only one sample which drives me crazy because during the indel realignment via GATK, the processes ends in chromosome 9 but gives no error, here is the tail of the log
INFO 10:57:36,310 ProgressMeter - 7:57136589 3.7736356E7 16.5 m 26.0 s 41.7% 39.6 m 23.1 m
INFO 10:58:06,312 ProgressMeter - 7:94166979 3.8936404E7 17.0 m 26.0 s 42.9% 39.6 m 22.6 m
INFO 10:58:36,313 ProgressMeter - 7:116838457 3.9836497E7 17.5 m 26.0 s 43.6% 40.1 m 22.6 m
INFO 10:59:06,315 ProgressMeter - 7:152055822 4.1036547E7 18.0 m 26.0 s 44.8% 40.2 m 22.2 m
INFO 10:59:36,316 ProgressMeter - 8:31716097 4.2233863E7 18.5 m 26.0 s 46.0% 40.2 m 21.7 m
INFO 11:00:06,423 ProgressMeter - 8:85353783 4.3434018E7 19.0 m 26.0 s 47.7% 39.8 m 20.8 m
INFO 11:00:36,425 ProgressMeter - 8:104075325 4.4434162E7 19.5 m 26.0 s 48.4% 40.3 m 20.8 m
INFO 11:01:06,426 ProgressMeter - 9:6600457 4.5663137E7 20.0 m 26.0 s 49.9% 40.1 m 20.1 m
INFO 11:01:36,428 ProgressMeter - 9:67965301 4.6863186E7 20.5 m 26.0 s 51.9% 39.5 m 19.0 m
INFO 11:02:04,323 GATKRunReport - Uploaded run statistics report to AWS S3
Of course the out bam file here is truncated at chromosome 9, but the input sorted bam file isn't. Also the indel target interval which was created in the previous step via GATK RealignerTargetCreator include all chromosomes not just the first 9. Does anyone face such a problem ? I tried re-aligning reads and reprocess it several time but nothing change I keep getting a truncation.
Thanks
Unfortunately, this isn't all that uncommon :(
Fortunately this is happening at the TargetCreator step, which will speed things up. I'd start by chopping up your BAM file to make it as small as possible but still generate the error - look at the regions in the output target file to see where it got up to on chromosome 9 before it quit, then go from 1 region before there (to capture the last good region) and the whole of chromosome 10. Then as you nibble your way through the BAM file 1million bp at a time, you'll eventually stop getting the error and start seeing chromosome 10 targets. Thats when you know you found the offending region of your BAM file, which you can send to GATK for help. :)