MarkDuplicates: Mates are missing
0
0
Entering edit mode
5.5 years ago

Hello, I am stuck in the problem of 'ValidateSamFile' of picard tools. I have checked this problem on different forums, but I didn't find any solution there.

I have used Hisat2 for the alignment of the paired-end fastq files (obtained after trimming by using trimmomatic tool) against Ensembl reference ids

hisat2 -p 8 --dta --summary-file summary -x 'path/to/Ensembl_ref/indexfile' -1 '/path/to/sample/S1_1p.fastq.gz' -2 '/path/to/sample/S1_2p.fastq.gz' -U '/path/to/sample/S1_1u.fastq.gz' -U '/path/to/sample/S2_2u.fastq.gz' -S S1.sam

After that I tried to validate the Sam file using Picard tools, 'ValidateSamFile'

java -jar picard.jar ValidateSamFile I=S.sam IGNORE_WARNINGS=true MODE=VERBOSE

which gave me the error,

> [Mon May 13 12:56:42 IST 2019] Executing as genomics@genomics-Precision-3630-Tower on Linux 4.13.0-1028-oem amd64;
> OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12;
> Deflater: Intel; Inflater: Intel; Provider GCS is not available;
> Picard version: 2.20.0-SNAPSHOT WARNING   2019-05-13 12:56:42 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur. INFO   2019-05-13 12:57:16
>SamFileValidator   Validated Read    10,000,000 records. 
> Elapsed time: 00:00:34s.  Time for last 10,000,000:   34s.  Last read position: 8:115,115,834 INFO    2019-05-13 12:57:57 SamFileValidator    Validated Read    20,000,000 records. 
> Elapsed time: 00:01:15s.  Time for last 10,000,000:   40s.  Last read position: 7:43,608,287
>ERROR: Read name S1.916145.1, Mate not found for paired read
>ERROR: Read name S1.916145.2, Mate not found for paired read 
>ERROR: Read name S1.9977032.1, Mate not found for paired read
>ERROR: Read name S1.9977032.2, Mate not found for paired read 
>ERROR: Read name S1.4916847.1, Mate not found for paired read

As per the picard tools guidelines, I have used FixMateInformation, to fix the above error, by using the following command,

java -jar picard.jar FixMateInformation I=S1.sam O=new_fixed_S1.sam

The error seems to be fixed,

> [Mon May 13 12:52:03 IST 2019] Executing as genomics@genomics-Precision-3630-Tower on Linux 4.13.0-1028-oem amd64;
> OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12;
> Deflater: Intel; Inflater: Intel; Provider GCS is not available;
> Picard version: 2.20.0-SNAPSHOT INFO  2019-05-13
> 12:52:03  FixMateInformation  Sorting input into queryname order.
> INFO  2019-05-13 12:53:34 SortingCollection   Creating merging iterator from 43 files
>INFO   2019-05-13 12:53:34 FixMateInformation Sorting by queryname complete.
>INFO   2019-05-13 12:53:34 FixMateInformation Output will be sorted by unsorted
>INFO   2019-05-13 12:53:34 FixMateInformation Traversing query name sorted records and fixing up mate pair information.
 >INFO 2019-05-13 12:53:36 FixMateInformation Processed 1,000,000 records. Elapsed time: 00:00:02s. Time for last 1,000,000:    2s.  Last read position:
 > */* INFO 2019-05-13 12:53:39 FixMateInformation  Processed 2,000,000 records. Elapsed time: 00:00:05s. Time for last 1,000,000: 2s. Last read position: 16:173,485
>INFO 2019-05-13 12:53:41 FixMateInformation Processed 3,000,000 records. Elapsed time: 00:00:07s. Time for last 1,000,000: 2s. Last read position: MT:2,103
>INFO   2019-05-13 12:53:44 FixMateInformation  Processed 4,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 2s.  Last read position: 2:101,004,226

Further, I revalidated the processed sam file, by using ValidateSamFile,

java -jar picard.jar ValidateSamFile I=new_fixed_S1.sam IGNORE_WARNINGS=true MODE=SUMMARY IGNORE=MISSING_TAG_NM

resulted in

> ## HISTOGRAM  java.lang.String 
>Error Type Count
>ERROR:MATE_NOT_FOUND 18412234

that means the error is not getting fixed, I repeated the whole process again assuming that the error will get fixed with several attempts, but i am simply repeating the loop with no progress.

At last, I ignored the error, an I started with 'MarkDuplicates' tool of picard tools using the following command

java -jar picard.jar MarkDuplicates I=new_fixed_S1.sam O=new_S1.sam M=marked_dup_metrics.txt REMOVE_DUPLICATES=true READ_NAME_REGEX=null

It resulted

[Mon May 13 13:13:13 IST 2019] Executing as genomics@genomics-Precision-3630-Tower on Linux 4.13.0-1028-oem amd64; OpenJDK 64-Bit Server VM 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.0-SNAPSHOT
INFO 2019-05-13 13:13:13    MarkDuplicates Start of doWork freeMemory: 996413816; totalMemory: 1011351552; maxMemory: 14974713856
INFO 2019-05-13 13:13:13    MarkDuplicates Reading input file and constructing read end information.
INFO 2019-05-13 13:13:13    MarkDuplicates Will retain up to 54256209 data points before spilling to disk.

and the program is running since 2 hours, don't know what is the problem, whether samfile generated from Hisat2 is having some fault or my commands are wrong or I am missing any error fixing tool.

I followed the post on the biostars but from there also i didn't get any clue. Any help in this regard is deeply appreciated. Thank you.

rna-seq trimmomatic Hisat2 picardtools samtools • 3.0k views
ADD COMMENT
0
Entering edit mode

Can you show the trimmomaticcommand?

ADD REPLY
0
Entering edit mode
java -jar trimmomatic-0.38.jar PE -threads 4 -phred33 S1_1.fastq.gz S1_2.fastq.gz S1_1p.fastq.gz S1_1u.fastq.gz S1_2p.fastq.gz S1_2u.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:1:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
ADD REPLY
0
Entering edit mode

@ATpoint, any guess to solve this issue.

ADD REPLY
0
Entering edit mode

What are all these fastq files? You typically have one pair in a paired-end experiment.

ADD REPLY
0
Entering edit mode

Two fastq files are paired (-1/-2), and the remaining two are unpaired (-U) This is one of my post

ADD REPLY
0
Entering edit mode

Hope this thread (I think this other post might be of interest for you and any other having the same issue: https://www.biostars.org/p/18137/) can help you. It reports a similar problem and worked for me.

ADD REPLY

Login before adding your answer.

Traffic: 1868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6