I received fastq files from core they said they have de multiplexed it but when i ran fastqc i can still see some adapters, attached is figure
My question along with fastq files with names like this
_TAGTCTTG_S7_L001_R1_001.fastq.gz
_TAGTCTTG_S7_L001_R2_001.fastq.gz
I also received some files which i am not sure what it has (i guess they are index)
TAGTCTTG_S7_L001_I1_001.fastq.gz
TAGTCTTG_S7_L001_I2_001.fastq.gz
zcat TAGTCTTG_S7_L001_I1_001.fastq.gz | head
@someinfo:1:1101:15235:1340 1:N:0:TAGTCTTGAT+TCTTTCCC
TAGTCTTGAT
+
CCDDDFFFFF
@someinfo:1:1101:15815:1395 1:N:0:TAGTCTTGAT+TCTTTCCC
TAGTCTTGAT
+
CCCCCFFFFF
@soomeinfo:1:1101:15719:1398 1:N:0:TAGTCTTGAT+TCTTTCCC
TAGTCTTGAT
when i look in to the actual fastq file i am not sure does it have both index and adapter? (core said they have demultiplex it) zcat _TAGTCTTG_S7_L001_R1_001.fastq.gz | head
@someinfo:1:1101:15235:1340 1:N:0:TAGTCTTGAT+TCTTTCCC
TGGGGCCTTAGTAAATGTGCCTGTGTGTGGGTCTCGGTCCAACACAGTTGATGTACATCTGTTTACCTGTTATAGTTGCAAGTTGTTCAGGCTGACATTGCTGTCGTTCACCCGACAAACACTGACTTCTACACCGGTGGTGAAGTAGGTAATGCGAGCTGGGTGCTGCCGAGTGTGTGTGTGCATGCTCAGCCGGCCGCGCAGACAGCTTGATCCTCTGACAGCTACGCAGATCGGAAGAGCACACGTC
+
DDCDDDCDFFFFGGGGGGGGGGHHHHHHHGGGHHHHGGGGHHHGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHGGGHHHHHGGGGGHHHHHHHHHHHHHHHHHGGFGGGGHHHHHHHHHHHHHGGGGGHHGHGGHHHHGGGGHHHGHHGHHHHGHHHHHHHHGGGGGGAGGGGGGGGGGGFFFFFFFFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
@someinfo:1:1101:15815:1395 1:N:0:TAGTCTTGAT+TCTTTCCC
TGGGGCCTTAGTAAATGTGCCTGTGTGTGGGTCTCGGTCCAACACAGTTGATGTACATCTGTTTACCTGTTATAGTTGCAAGTTGTTCAGGCTGACATTGCCTCGACAGTGATGCTGTCGTTCACCCGACAAACACTGACTTCTACACCGGTGGTGAAGTAGGTAATGCGAGCTGGGTGCTGCCGAGTGTGTGTGTGCATGCTCAGCCGGCCGCGCAGACAGCTTGATCCTCTGACAGCTACGCAGATCG
+
CCCCCCCCFFFFGGGGGGGGGGHHHHHHHGGGGHHHGGGGHHHGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHGGGHHHHHGGGGGHHHHHGGHHHHHGHHHHGGGGGGGFFGFHHGHHHHGHGGGGGHGGFEGHHHHG-CCGHHGHHHHHHHHGHGHHHGGGGGGGGGGFFFFFFFFFFFFFFFFEFFFFFFFFFFF?DFFFF
@someinfo:1:1101:15719:1398 1:N:0:TAGTCTTGAT+TCTTTCCC
TGGGGCCTTAGTAAATGTGCCTGTGTGTGGGTCTCGGTCCAACACAGTTGATGTACATCTGTTTACCTGTTATAGTTGCAAGTTGTTCAGGCTGACATTGCCTCGATCGACAGTGATGCTGTCGTTCACCCGACAAACACTGACTTCTACACCGGTGGTGAAGTAGGTAATGCGAGCTGGGTGCTGCCGAGTGTGTGTATGCATGCTCAGCCGGCCGCGCAGACAGCTTGATCCTCTGACAGCTACGCAG
I did know about this and went ahead and aligned here is snapshot of how the alignments look in igv(4 samples paired end on Miseq (2*250)) sorted using base and used show soft clip in preferences.(suggested by some one from the core)
How can i solve remove them with out loosing any information from actual reads
Clearly, your DNA library prep was not optimal. I am not sure what's going on in your IGV images, but it's very obvious from your first (% adapter) graph that the insert size was too short compared to read length.
Your IGV images look like amplicon data. Can you describe this in more detail? Did you authorize the sequencing center to PCR-amplify your DNA sample? There's no way such a high proportion of reads would have the exact same start site without amplification. Considering that none of the reads you posted agree with the reference, it looks bad. How did you align the reads?
Also, the specific reference would be helpful here... and, what you are trying to do is also always useful information.
I encourage you to post an insert-size histogram and detail the platform and read length used. I'm guessing you ran 2x250bp on a MiSeq, but it's not really possible to tell from what you posted.
Also:
Posting those results would be useful, along with the screen output.
Apologies for incomplete information,Yes these were PCR amplicons that were sequenced, I aligned the reads using bwa mem, we were trying to induce a deletion and check if worked by sequencing exon 6 of a particular gene.
Oh... if you're looking for a somewhat long deletion, I suggest you try aligning with BBMap; it's very good at capturing those within the alignment of a read.
Sure, some additional info about the experiment attempting to detect indels from a panel of clones resulting from CRISPR targeted deletion. Regions around the target were PCR amplified to produce a roughly 150bp amplicon, which was then sequenced with as a PE250 run.
You can detect the adapter sequences and trim them like this:
Then map the trimmed (interleaved) reads and you'll get better results.