Do I have adapters in my reads?
1
0
Entering edit mode
4.6 years ago
ntsopoul ▴ 60

Hi,

I had downloaded some FASTQ files from SRA that I wanted to analyze. I checked in FASTQC (see https://ibb.co/vd0CxH3) and I could not see any adapter contamination, so I assumed the adaptors were removed. I proceeded with trimming (Trimmomatic) I then aligned the reads to the mm9 genome via STAR aligner (output as BAM sorted). To check the quality of my reads I used picard CollectAlignmentSummaryMetrics.

However, I did not get an output file and I saw that the screen printed some adaptor sequences (below). I googled those seqs and they seem to be Illumina True-seq adaptors.

**ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTT**G]

My questions are the following

  1. Why was no output file generated by picard?
  2. do I have a contamination with adaptor sequences (I guess so..)?
  3. Why did Fastqc not detect the adaptors and if so, what can I do to make fastqc detect the adaptors.
  4. are there any other strange things in the output below?

Thanks a lot!

Here is the complete picard output on screen.

21:45:50.682 INFO  NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/tsopoulidis/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Sun May 03 21:45:50 EDT 2020] CollectAlignmentSummaryMetrics INPUT=/Users/tsopoulidis/bam_files/Finnley_et_al_mm9_UCSC/SRR4423430.fastqAligned.sortedByCoord.out.bam OUTPUT=output_picard.txt REFERENCE_SEQUENCE=/Users/tsopoulidis/NCBIM37.genome.fa    MAX_INSERT_SIZE=100000 EXPECTED_PAIR_ORIENTATIONS=[FR] **ADAPTER_SEQUENCE=[AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTT**G] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false COLLECT_ALIGNMENT_INFORMATION=true ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun May 03 21:45:50 EDT 2020] Executing as tsopoulidis@Hochedlinger-TsopMBP2018.local on Mac OS X 10.15.3 x86_64; Java HotSpot(TM) 64-Bit Server VM 14.0.1+7; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.22.4
INFO    2020-05-03 21:45:56 SinglePassSamProgram    Processed     1,000,000 records.  Elapsed time: 00:00:05s.  Time for last 1,000,000:    4s.  Last read position: MT:7,139
INFO    2020-05-03 21:45:58 SinglePassSamProgram    Processed     2,000,000 records.  Elapsed time: 00:00:07s.  Time for last 1,000,000:    1s.  Last read position: MT:14,174
INFO    2020-05-03 21:46:09 SinglePassSamProgram    Processed     3,000,000 records.  Elapsed time: 00:00:18s.  Time for last 1,000,000:   11s.  Last read position: 19:6,058,451
INFO    2020-05-03 21:46:10 SinglePassSamProgram    Processed     4,000,000 records.  Elapsed time: 00:00:19s.  Time for last 1,000,000:    1s.  Last read position: 19:16,238,686
INFO    2020-05-03 21:46:12 SinglePassSamProgram    Processed     5,000,000 records.  Elapsed time: 00:00:21s.  Time for last 1,000,000:    1s.  Last read position: 19:41,992,181
INFO    2020-05-03 21:46:13 SinglePassSamProgram    Processed     6,000,000 records.  Elapsed time: 00:00:22s.  Time for last 1,000,000:    1s.  Last read position: 18:11,880,447
INFO    2020-05-03 21:46:14 SinglePassSamProgram    Processed     7,000,000 records.  Elapsed time: 00:00:23s.  Time for last 1,000,000:    1s.  Last read position: 18:48,206,916
INFO    2020-05-03 21:46:16 SinglePassSamProgram    Processed     8,000,000 records.  Elapsed time: 00:00:25s.  Time for last 1,000,000:    1s.  Last read position: 18:82,508,559
INFO    2020-05-03 21:46:17 SinglePassSamProgram    Processed     9,000,000 records.  Elapsed time: 00:00:26s.  Time for last 1,000,000:    1s.  Last read position: 17:24,646,933
INFO    2020-05-03 21:46:18 SinglePassSamProgram    Processed    10,000,000 records.  Elapsed time: 00:00:27s.  Time for last 1,000,000:    1s.  Last read position: 17:33,961,267
INFO    2020-05-03 21:46:20 SinglePassSamProgram    Processed    11,000,000 records.  Elapsed time: 00:00:29s.  Time for last 1,000,000:    1s.  Last read position: 17:45,707,209
INFO    2020-05-03 21:46:21 SinglePassSamProgram    Processed    12,000,000 records.  Elapsed time: 00:00:30s.  Time for last 1,000,000:    1s.  Last read position: 17:81,017,680
INFO    2020-05-03 21:46:22 SinglePassSamProgram    Processed    13,000,000 records.  Elapsed time: 00:00:31s.  Time for last 1,000,000:    1s.  Last read position: 16:20,219,705
INFO    2020-05-03 21:46:24 SinglePassSamProgram    Processed    14,000,000 records.  Elapsed time: 00:00:33s.  Time for last 1,000,000:    1s.  Last read position: 16:55,969,722
INFO    2020-05-03 21:46:25 SinglePassSamProgram    Processed    15,000,000 records.  Elapsed time: 00:00:34s.  Time for last 1,000,000:    1s.  Last read position: 15:25,712,740
INFO    2020-05-03 21:46:26 SinglePassSamProgram    Processed    16,000,000 records.  Elapsed time: 00:00:35s.  Time for last 1,000,000:    1s.  Last read position: 15:75,726,842
INFO    2020-05-03 21:46:27 SinglePassSamProgram    Processed    17,000,000 records.  Elapsed time: 00:00:36s.  Time for last 1,000,000:    1s.  Last read position: 15:83,417,292
INFO    2020-05-03 21:46:29 SinglePassSamProgram    Processed    18,000,000 records.  Elapsed time: 00:00:38s.  Time for last 1,000,000:    1s.  Last read position: 15:101,914,783
INFO    2020-05-03 21:46:30 SinglePassSamProgram    Processed    19,000,000 records.  Elapsed time: 00:00:39s.  Time for last 1,000,000:    1s.  Last read position: 13:30,077,476
INFO    2020-05-03 21:46:31 SinglePassSamProgram    Processed    20,000,000 records.  Elapsed time: 00:00:40s.  Time for last 1,000,000:    1s.  Last read position: 13:67,283,295
INFO    2020-05-03 21:46:32 SinglePassSamProgram    Processed    21,000,000 records.  Elapsed time: 00:00:41s.  Time for last 1,000,000:    1s.  Last read position: 13:101,581,365
INFO    2020-05-03 21:46:34 SinglePassSamProgram    Processed    22,000,000 records.  Elapsed time: 00:00:43s.  Time for last 1,000,000:    1s.  Last read position: 12:19,991,021
INFO    2020-05-03 21:46:35 SinglePassSamProgram    Processed    23,000,000 records.  Elapsed time: 00:00:44s.  Time for last 1,000,000:    1s.  Last read position: 12:72,089,458
INFO    2020-05-03 21:46:36 SinglePassSamProgram    Processed    24,000,000 records.  Elapsed time: 00:00:45s.  Time for last 1,000,000:    1s.  Last read position: 12:111,892,434
INFO    2020-05-03 21:46:38 SinglePassSamProgram    Processed    25,000,000 records.  Elapsed time: 00:00:47s.  Time for last 1,000,000:    1s.  Last read position: 11:6,319,644
INFO    2020-05-03 21:46:39 SinglePassSamProgram    Processed    26,000,000 records.  Elapsed time: 00:00:48s.  Time for last 1,000,000:    1s.  Last read position: 11:40,523,478
INFO    2020-05-03 21:46:40 SinglePassSamProgram    Processed    27,000,000 records.  Elapsed time: 00:00:49s.  Time for last 1,000,000:    1s.  Last read position: 11:58,206,297
INFO    2020-05-03 21:46:41 SinglePassSamProgram    Processed    28,000,000 records.  Elapsed time: 00:00:50s.  Time for last 1,000,000:    1s.  Last read position: 11:70,796,276
INFO    2020-05-03 21:46:43 SinglePassSamProgram    Processed    29,000,000 records.  Elapsed time: 00:00:52s.  Time for last 1,000,000:    1s.  Last read position: 11:86,014,974
INFO    2020-05-03 21:46:44 SinglePassSamProgram    Processed    30,000,000 records.  Elapsed time: 00:00:53s.  Time for last 1,000,000:    1s.  Last read position: 11:99,794,025
INFO    2020-05-03 21:46:45 SinglePassSamProgram    Processed    31,000,000 records.  Elapsed time: 00:00:54s.  Time for last 1,000,000:    1s.  Last read position: 11:116,711,934
INFO    2020-05-03 21:46:46 SinglePassSamProgram    Processed    32,000,000 records.  Elapsed time: 00:00:55s.  Time for last 1,000,000:    1s.  Last read position: 9:21,552,765
INFO    2020-05-03 21:46:47 SinglePassSamProgram    Processed    33,000,000 records.  Elapsed time: 00:00:56s.  Time for last 1,000,000:    1s.  Last read position: 9:48,297,510
INFO    2020-05-03 21:46:49 SinglePassSamProgram    Processed    34,000,000 records.  Elapsed time: 00:00:58s.  Time for last 1,000,000:    1s.  Last read position: 9:64,023,963
INFO    2020-05-03 21:46:50 SinglePassSamProgram    Processed    35,000,000 records.  Elapsed time: 00:00:59s.  Time for last 1,000,000:    1s.  Last read position: 9:78,326,860
INFO    2020-05-03 21:46:51 SinglePassSamProgram    Processed    36,000,000 records.  Elapsed time: 00:01:00s.  Time for last 1,000,000:    1s.  Last read position: 9:95,357,520
INFO    2020-05-03 21:46:52 SinglePassSamProgram    Processed    37,000,000 records.  Elapsed time: 00:01:01s.  Time for last 1,000,000:    1s.  Last read position: 9:115,949,908
INFO    2020-05-03 21:46:54 SinglePassSamProgram    Processed    38,000,000 records.  Elapsed time: 00:01:03s.  Time for last 1,000,000:    1s.  Last read position: 14:20,641,844
INFO    2020-05-03 21:46:55 SinglePassSamProgram    Processed    39,000,000 records.  Elapsed time: 00:01:04s.  Time for last 1,000,000:    1s.  Last read position: 14:46,703,883
INFO    2020-05-03 21:46:56 SinglePassSamProgram    Processed    40,000,000 records.  Elapsed time: 00:01:05s.  Time for last 1,000,000:    1s.  Last read position: 14:64,114,566
INFO    2020-05-03 21:46:57 SinglePassSamProgram    Processed    41,000,000 records.  Elapsed time: 00:01:06s.  Time for last 1,000,000:    1s.  Last read position: 14:112,765,876
INFO    2020-05-03 21:46:59 SinglePassSamProgram    Processed    42,000,000 records.  Elapsed time: 00:01:08s.  Time for last 1,000,000:    1s.  Last read position: 10:39,959,560
INFO    2020-05-03 21:47:00 SinglePassSamProgram    Processed    43,000,000 records.  Elapsed time: 00:01:09s.  Time for last 1,000,000:    1s.  Last read position: 10:79,325,780
INFO    2020-05-03 21:47:01 SinglePassSamProgram    Processed    44,000,000 records.  Elapsed time: 00:01:10s.  Time for last 1,000,000:    1s.  Last read position: 10:93,352,283
INFO    2020-05-03 21:47:03 SinglePassSamProgram    Processed    45,000,000 records.  Elapsed time: 00:01:12s.  Time for last 1,000,000:    1s.  Last read position: 10:127,985,372
INFO    2020-05-03 21:47:04 SinglePassSamProgram    Processed    46,000,000 records.  Elapsed time: 00:01:13s.  Time for last 1,000,000:    1s.  Last read position: 8:44,215,935
INFO    2020-05-03 21:47:05 SinglePassSamProgram    Processed    47,000,000 records.  Elapsed time: 00:01:14s.  Time for last 1,000,000:    1s.  Last read position: 8:86,455,511
[Sun May 03 21:47:06 EDT 2020] picard.analysis.CollectAlignmentSummaryMetrics done. Elapsed time: 1.27 minutes.
Runtime.totalMemory()=995098624
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.FileTruncatedException: Premature end of file: /Users/tsopoulidis/bam_files/Finnley_et_al_mm9_UCSC/SRR4423430.fastqAligned.sortedByCoord.out.bam
    at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
    at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
    at java.base/java.io.DataInputStream.read(DataInputStream.java:148)
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:282)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:866)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:840)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:574)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:553)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:149)
    at picard.analysis.SinglePassSamProgram.doWork(SinglePassSamProgram.java:94)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine

.main(PicardCommandLine.java:113)

rna-seq alignment Assembly • 1.1k views
ADD COMMENT
1
Entering edit mode
4.6 years ago

Looks like your alignment file (bam) is truncated. Premature end of file: /Users/tsopoulidis/bam_files/Finnley_et_al_mm9_UCSC/SRR4423430.fastqAligned.sortedByCoord.out.bam

Can you run samtools index on it ?

The adapter sequence picard is showing is the default sequence that its going to use to check against your data, Its not detected from your data.

ADD COMMENT
0
Entering edit mode

Ok I think I found the issue, It is a premature stop because I run out of memory.

ADD REPLY

Login before adding your answer.

Traffic: 1815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6