I'm trying to extract the FASTQs from a 1K genomes BAM using picard SamToFastq
but it raises an error;
net.sf.picard.PicardException: Found 7657 unpaired mates
at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:185)
the Makefile:
.PHONY= NA12878.fastqs
NA12878_1.fastq.gz NA12878_2.fastq.gz : NA12878.fastqs
NA12878.fastqs: NA12878.bam
java -jar /path/to/picard-tools-1.87/SamToFastq.jar I=$< \
VALIDATION_STRINGENCY=SILENT \
FASTQ=$(basename $@)_1.fastq SECOND_END_FASTQ=$(basename $@)_2.fastq
gzip --best $(basename $@)_1.fastq $(basename $@)_2.fastq
NA12878.bam:
curl -o $@ "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase2b_alignment/data/NA12878/exome_alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.exome.20120522_p2b.bam"
I also tried to use FixMateInformation, but it raised the same error. How can I fix this ?
Thanks.
The reason that the mates are missing and FixMateInformation does not work as you are expecting is that the chrom20 BAM files only contain reads that mapped to chromosome 20. If the mate mapped to another chromosome, then it would not be in this file. All mate-pairs should be in the "mapped" BAM files distributed by the project.