Converting TCGA Bam files to fastq: Picard does not work!
1
1
Entering edit mode
7.3 years ago
jonessara770 ▴ 240

Hello,

I am trying to convert bam files from TCGA to fastq. Picard gives the following error:

picard.sam.SamToFastq done. Elapsed time: 0.78 minutes.
Runtime.totalMemory()=2058354688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Illegal mate state: H090WADXX130325:1:1106:10520:95300
    at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:342)
    at picard.sam.SamToFastq.doWork(SamToFastq.java:164)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:185)
    at picard.sam.SamToFastq.main(SamToFastq.java:137)

The error is due to more than one pair of reads having the same query name.

it has been suggested to use bedtools bamtofastq. This produce the fastq files, however there are duplicated read names that makes my pipeline to crash in downstream steps…

I also tested “resolvepair” script but it does not produce anything…

I would like to either remove these duplicate read names or rename them. Do you have a solution to solve this issue?

Thanks

wes sequencing • 3.3k views
ADD COMMENT
0
Entering edit mode

did you use the latest version of picard ? did you use VALIDATION_STRINGENCY=LENIENT ?

ADD REPLY
0
Entering edit mode

Thanks for your reply! I ran it again with this version (picard-2.9.0/picard.jar SamToFastq VALIDATION_STRINGENCY=LENIENT) but get the same error.

ADD REPLY
0
Entering edit mode

Do you know which aligner created the BAM? I have found BAM to FASTQ conversion almost impossible in some cases of BAMs originating from RNA-SEQ. I believe its related to conflicting interpretations of mates and pairs.

ADD REPLY
0
Entering edit mode

yes, these are aligned by BWA meme

ADD REPLY
0
Entering edit mode

In BWA site I see this Q/A:

With BWA-MEM/BWA-SW, my tools are complaining about multiple primary alignments. Is it a bug? It is not. Multi-part alignments are possible in the presence of structural variations, gene fusion or reference misassembly. However, representing multi-part alignments in SAM has not been finalized. To make BWA work with your tools, please use option `-M' to flag extra hits as secondary.

I believe "SAM has not been finalized" for multi-part alignments is basically the "conflicting interpretations of mates and pairs".

ADD REPLY
0
Entering edit mode
6.1 years ago
rmh1995 • 0

I understand this is very late, but I believe UNC has provided some code to solve this issue. UBU. GenoMax disucsses the problem in a little detail here I hope this helps anyone currently looking for this solution!

ADD COMMENT

Login before adding your answer.

Traffic: 2809 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6