Entering edit mode
6.9 years ago
jiangzhiyong12
•
0
Hi, i have PE resequencing data, but in some fastq file, it has two same reads' names, now i want to delete one of the two, so i want some suggestions from you all. Thank you.
That is not possible unless the files were messed with in some way/mistreated. How did you determine (which program/error message) that you have this condition?
when i run the GATK workflow when markduplicates, i got this error:
so i search the read name HWI-ST1307:159:C48TVACXX:7:1109:1787:63474 in my fastq file ,i got two same reads' names, i also search in my .sam file, i got 4 same reads' names, weired......which i thought it's the fault of the sequencing company, maybe they just copy any data within the same file, and put them together.......
Can you use
grep -A
and tell us if the content of the two reads with identical names is the same in terms of sequence and quality scores?i search my sam file
Result:
I really don'y know the reason. any help would be appreciated
That is odd. If you have not done anything to your SAM file then it is likely that your original fastq file has that read in there two times. Can you check that next?
It' true, I do have two same reads' name in my original fastq file. I get my fastq file reads' name and to get the unique reads' name, more weired thing is, 43063238(total reads' name) - 24218735(unique reads' name) = 18844503(duplicates' reads' name).......I don't understand......
So the problem is much bigger than you expected. If the sequence is identical for the duplicate reads then you will have to deduplicate them or get a new copy of the original data.
Yes, you are right, i deduplicate them, just get unique reads, with the next command: $seqtk subseq /disk5/jiangzy/bowtie2/trimmomatic/1_1_clean.fastq remaining_1.list > 1_1.remain.fastq just got 2.71G fastq file, compare to the original data 10.22G. then i used the FastQC tools to get the info of my data. here is the most weired thing: https://ibb.co/fDd3e6 https://ibb.co/jvrOe6 https://ibb.co/nvdQsR https://ibb.co/bWGdCR https://ibb.co/hCKbz6 https://ibb.co/jYWpK6
Still thank you.
Hi @jiangzhiyong12
Were you able to sort this out? I am having a similar kind of issue ...