Hi,
I've used the NuGEN Ovation RRBS Methyl-Seq System. I'm trying to follow the instructions in https://github.com/nugentechnologies/NuMetRRBS. The only thing I don't understand is how to remove duplicates.
I have fastq files of reads that I can trim and align, but I don't have an index.fq file (according to docs: "FASTQ file containing the molecular tag sequence for each read name in the corresponding SAM/BAM file"). Am I supposed to generate this file or is supposed to be provided by the sequencing service?
Do I need to have a 12 nucleotide index read to remove duplicates?
The user-guide states "If you wish to utilize this PCR duplicate marking feature, increase the index read from 6 to 12 nucleotides, then use the Tecan-provided Duplicate Marking tool, NuDup, to identify and discard any PCR duplicates found"
On the other hand, the nudup.py documentation (https://github.com/nugentechnologies/NuMetRRBS) states "If the index FASTQ read length is 6, 8, 12, 14, or 16nt long as expected for Tecan products, the molecular tag sequence to be extracted from the read according to -s and -l parameters, otherwise the molecular tag will be extracted from the header of the FASTQ entry."
As far as I can tell my reads seem to have a 6 base index. Take for example a read in one of my fastq files named FGC1866_s_4_2_GTCGTA.fastq.gz:
@K00315:137:HVJ5KBBXX:4:1101:1560:1490 2:N:0:GTCGTA
TTCGATTTCCAACGTATATATTTTTTTTTTTTTCTCACTCATATAAAATATTCTACAATATAATTTTCGTCATTTTCCATGTTTTTGATTATACCTCATTAATATACACTATTCTAAAATACCGAATTATCAAAAAAATACACATTTAAA
+
AAAFFJJJJJJJJJJJJJJJJJJJJ---<-A7--<AJ-7FJ7F<AFFA--AAA--AAFF-<-A<FAFJ-<--<-AA<----7<<FA--7--7--77--7A<AF-A----<-<FJFJ-77<--7------77---------7-----7---
For future experiments, should I ask the sequencing core to use a 12-base index? Should my fastq headers format be something like @K00315:137:HVJ5KBBXX:4:1101:1560:1490 2:N:0:GTCGTACTCACT?
- Now that you have seen the files I have, can I remove PCR duplicates from these after alignment or not?
I'm confused and would appreciate any help you can provide. Thanks!
Thanks! I'll check out clumpify.