I am trying to take the PE reads (100 bp) aligned to chr2 of one species and then convert it to PE fastq again. Then I want to align those reads to chr2 of another species to see the alignment percentage and mismatch ratios in different regions.
this is what I did:
# extract alignment at chr2. The chromosome has no prefixes so only 2
samtools view -h -b sorted_2ms04h.bam 2 > chr2.2ms04h.bam
# then bamtofastq
bamToFastq -i chr2.sorted.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2
# I am getting lots of warning message I got only about 20 mb of fastq (R1 and R2 each) from almost 500 mb BAM file
*****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7097:5645 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7103:24283 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7109:36458 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7111:45560 is marked as paired, but its mate does not occur next to it in your BAM file. Skipping.
# I also tired Picard
java -jar picard2.9.jar SamToFastq I=chr2.sorted.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq
# Error message
[Mon Jun 12 08:04:10 EDT 2017] picard.sam.SamToFastq INPUT=chr2.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ=R2.fq UNPAIRED_FASTQ=unPaired.fq OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Mon Jun 12 08:04:10 EDT 2017] Executing as everestial007@everestial007-Inspiron-3647 on Linux 4.4.0-57-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Picard version: 2.9.2-SNAPSHOT
[Mon Jun 12 08:04:11 EDT 2017] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=189267968
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Illegal mate state: HWI-D00123R2:155:C4AM3ACXX:2:1101:20700:91575
at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:385)
at picard.sam.SamToFastq.doWork(SamToFastq.java:194)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)
But, samtools flagstat chr2.sorted.2ms04h.bam > chr2.stats
# shows there are good amount of paired end/mate paired reads.
7549722 + 0 in total (QC-passed reads + QC-failed reads)
1421823 + 0 duplicates
7549722 + 0 mapped (100.00%:-nan%)
7549722 + 0 paired in sequencing
3771662 + 0 read1
3778060 + 0 read2
7548507 + 0 properly paired (99.98%:-nan%)
7548963 + 0 with itself and mate mapped
759 + 0 singletons (0.01%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
So, what is wrong with my process?
FYI, there is a
fastq
command in new version ofsamtools
. Check hereyour bam is corrupted: some mates are missing from the BAM.
But, 99.98 % are properly paired. Shouldn't it work ?
You'll need 100%, since the difference aren't marked as singletons. Run
samtools fixmate
, that might resolve the issue.I am having issues even after fixmate.
What is really going on? grr