Hello everyone
I am trying to remove duplicates from a bam file using picard
with the command below
java -jar picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=hg38.r.bam O=hg38.dedup.bam M=metrices.txt
when I run this code I get this message
21:48:20.762 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sun Feb 28 21:48:20 IRST 2021] MarkDuplicates INPUT=[hg38.r.bam] OUTPUT=hg38.dedup.bam METRICS_FILE=metrices.txt REMOVE_DUPLICATES=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture="" of="" last="" three="" ':'="" separated="" fields="" as="" numeric="" values=""> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sun Feb 28 21:48:20 IRST 2021] Executing as ptaklifi@ibb-server on Linux 5.4.0-45-generic amd64; OpenJDK 64-Bit Server VM 11.0.10+9-Ubuntu-0ubuntu1.18.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.25.0
INFO 2021-02-28 21:48:20 MarkDuplicates Start of doWork freeMemory: 178254872; totalMemory: 184549376; maxMemory: 16777216000
INFO 2021-02-28 21:48:20 MarkDuplicates Reading input file and constructing read end information.
INFO 2021-02-28 21:48:20 MarkDuplicates Will retain up to 60787014 data points before spilling to disk.
[Sun Feb 28 21:48:20 IRST 2021] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=671088640
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line:
@RG ID:SRR10984462; File /media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.r.bam; Line number 197
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:258)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:46)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:358)
at htsjdk.samtools.SAMTextHeaderCodec.parseRGLine(SAMTextHeaderCodec.java:168)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:110)
at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:704)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:406)
at picard.sam.markduplicates.util.AbstractMarkDuplicatesCommandLineProgram.openInputs(AbstractMarkDuplicatesCommandLineProgram.java:262)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:508)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
I'm not sure if there is a problem with command syntax or input file , and how would I fix it
The error is written in the message itself:
Cannot read non-existent file: file:///media/kavousi/eaf2d15a-4cb1-4dee-ade8-6954bdc813e1/Taklifi/dbgap/fastq/SRR10984462/bam/hg38.rbam
thank you, I'm sorry I misspelled the input file. I fixed it and run the command again. I edited my post and put the error message, as you can see I get a new error