Bam to fastq conversion is giving error?
2
3
Entering edit mode
7.6 years ago
kirannbishwa01 ★ 1.6k

I am trying to take the PE reads (100 bp) aligned to chr2 of one species and then convert it to PE fastq again. Then I want to align those reads to chr2 of another species to see the alignment percentage and mismatch ratios in different regions.

this is what I did:

# extract alignment at chr2. The chromosome has no prefixes so only 2
    samtools view -h -b sorted_2ms04h.bam 2 > chr2.2ms04h.bam  

# then bamtofastq
    bamToFastq -i chr2.sorted.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2

# I am getting lots of warning message I got only about 20 mb of fastq (R1 and R2 each) from almost 500 mb BAM file
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7097:5645 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7103:24283 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7109:36458 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
    *****WARNING: Query HWI-D00123R2:155:C4AM3ACXX:2:1101:7111:45560 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 

# I also tired Picard
    java -jar picard2.9.jar SamToFastq I=chr2.sorted.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq

   # Error message
   [Mon Jun 12 08:04:10 EDT 2017] picard.sam.SamToFastq INPUT=chr2.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ=R2.fq UNPAIRED_FASTQ=unPaired.fq    OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 INCLUDE_NON_PRIMARY_ALIGNMENTS=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
   [Mon Jun 12 08:04:10 EDT 2017] Executing as everestial007@everestial007-Inspiron-3647 on Linux 4.4.0-57-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11; Picard version: 2.9.2-SNAPSHOT
   [Mon Jun 12 08:04:11 EDT 2017] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
   Runtime.totalMemory()=189267968
   To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
   Exception in thread "main" picard.PicardException: Illegal mate state: HWI-D00123R2:155:C4AM3ACXX:2:1101:20700:91575
at picard.sam.SamToFastq.assertPairedMates(SamToFastq.java:385)
at picard.sam.SamToFastq.doWork(SamToFastq.java:194)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

But, samtools flagstat chr2.sorted.2ms04h.bam > chr2.stats

   # shows there are good amount of paired end/mate paired reads.
       7549722 + 0 in total (QC-passed reads + QC-failed reads)
       1421823 + 0 duplicates
       7549722 + 0 mapped (100.00%:-nan%)
       7549722 + 0 paired in sequencing
       3771662 + 0 read1
       3778060 + 0 read2
       7548507 + 0 properly paired (99.98%:-nan%)
       7548963 + 0 with itself and mate mapped
       759 + 0 singletons (0.01%:-nan%)
       0 + 0 with mate mapped to a different chr
       0 + 0 with mate mapped to a different chr (mapQ>=5)

So, what is wrong with my process?

bam fastq picard software error genome • 6.6k views
ADD COMMENT
0
Entering edit mode

FYI, there is a fastq command in new version of samtools. Check here

ADD REPLY
0
Entering edit mode

your bam is corrupted: some mates are missing from the BAM.

   3771662 + 0 read1
   3778060 + 0 read2
ADD REPLY
0
Entering edit mode

But, 99.98 % are properly paired. Shouldn't it work ?

ADD REPLY
0
Entering edit mode

You'll need 100%, since the difference aren't marked as singletons. Run samtools fixmate, that might resolve the issue.

ADD REPLY
0
Entering edit mode

I am having issues even after fixmate.

What is really going on? grr

ADD REPLY
4
Entering edit mode
7.6 years ago
kirannbishwa01 ★ 1.6k

I wan't able to work out a way to convert BAM file to FASTQ using bam2fastq, picard and not even a very clean output using samtools.

I think this answer might work as a tutorial if people want to extract the exact amount of reads designated by samtools flagstat.

Say, I have a problem where I want to take the reads that are aligned to chromosome 2 of one species. And, I want to see how much (fraction) of these same reads align to chr 2 of another species or population and with how much mismatches.

Step 01: Select reads from Chromosome 2
         $ samtools view -h -b realigned_2ms04h.bam 2 > chr2.2ms04h.bam

       # sort the bam alignment
         $ samtools sort -n chr2.2ms04h.bam chr2.sorted.2ms04h.bam

       # fix mate pairs/info
         $ samtools fixmate chr2.sorted.2ms04h.bam chr2.fixmate.2ms04h.bam

       # Let's check the simple statistics of the bam file
         $ samtools flagstat chr2.fixmate.2ms04h.bam > chr2.fixmate.stats

          # Stats output from "chr2.fixmate.stats" file:
          7549722 + 0 in total (QC-passed reads + QC-failed reads)
          1421823 + 0 duplicates
          7549722 + 0 mapped (100.00%:-nan%)
          7549722 + 0 paired in sequencing
          3771662 + 0 read1
          3778060 + 0 read2
          6936646 + 0 properly paired (91.88%:-nan%)
          6937102 + 0 with itself and mate mapped
          612620 + 0 singletons (8.11%:-nan%)
          0 + 0 with mate mapped to a different chr
          0 + 0 with mate mapped to a different chr (mapQ>=5)

Step 02: Convert the mate-fixed bam to fastq using one of the below methods

       # A) using bam2fastq 
         $ bamToFastq -i chr2.fixmate.2ms04h.bam -fq chr2.2ms04h.R1 -fq2 chr2.2ms04h.R2

       # B) using Picard
         $ java -jar picard2.9.jar SamToFastq I=chr2.fixmate.2ms04h.bam FASTQ=R1.fq SECOND_END_FASTQ = R2.fq UNPAIRED_FASTQ = unPaired.fq

But, bam2fastq and picard didn't work for me.

So, contd... Step 02 - C

    If, you want to select only the properly paired reads use `bitwise flag -f 2`. if you want to remove duplicated reads use `bitwise flag -F 1024`
        # 2-C (optional extra step) : Select properly paired (bitwise flag: -f 2) and remove duplicates (flag: -F 1024) 
        $ samtools view -b -f 2 -F 1024 chr2.fixmate.2ms04h.bam > chr2.Paired_DeDup.bam
        $ samtools flagstat chr2.Paired_DeDup.bam > chr2.Paired_DeDup.stats 

        # Stats output from "chr2.Paired_DeDup.stats" file:
        5629430 + 0 in total (QC-passed reads + QC-failed reads)
        0 + 0 duplicates
        5629430 + 0 mapped (100.00%:-nan%)
        5629430 + 0 paired in sequencing
        3061503 + 0 read1
        2567927 + 0 read2
        5629430 + 0 properly paired (100.00%:-nan%)
        5629430 + 0 with itself and mate mapped
        0 + 0 singletons (0.00%:-nan%)
        0 + 0 with mate mapped to a different chr
        0 + 0 with mate mapped to a different chr (mapQ>=5)

         # 2-C:  Now, convert the bam files into Read1/2 matepairs, using bitwise flag
        $ samtools view -uf64 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R1.fq.gz
        $ samtools view -uf128 chr2.Paired_DeDup.bam |samtools bam2fq - |gzip > chr2.2ms04h_R2.fq.gz

         # Unzip the files and count the number of reads in R1, R2 read pairs
         # Count
        $ cat chr2.2ms04h_R1.fq | echo $((`wc -l`/4))
          3061503
        $ cat chr2.2ms04h_R2.fq | echo $((`wc -l`/4))
          2567927
  • You can see that the number of reads reported by samtools flagstat and reads (R1 and R2) obtained using bitwise method are the same.
  • If you didn't remove the the duplicates, singletons reads, those can also be extracted using bitwise flag and will be equal to the numbers reported in samtools flagstat file chr2.fixmate.stats.
ADD COMMENT
1
Entering edit mode
7.6 years ago
Tom_L ▴ 360

No big deal, you just have to sort your bam by read names before converting data as recommended by bedtools (http://bedtools.readthedocs.io/en/latest/content/tools/bamtofastq.html).

BAM should be sorted by query name (samtools sort -n aln.bam aln.qsort) if creating paired FASTQ with this option.

Samtools sort does the job.

samtools sort [options] <in.bam> <out.prefix>
     -n        sort by read name
     -f        use <out.prefix> as full file name instead of prefix
     -o        final output to stdout
     -l INT    compression level, from 0 to 9 [-1]
     -@ INT    number of sorting and compression threads [1]
     -m INT    max memory per thread; suffix K/M/G recognized [768M]

Cheers.

ADD COMMENT
0
Entering edit mode

I actually did sort the bam using ' samtools sort -n '. I just don't put the code there, but the name of the bam suggests that.

ADD REPLY

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6