Question

Problem in paired-end data due to annotation

1

Entering edit mode

5.7 years ago

priya120195 ▴ 20

Running of Prinseq tool for filtering of paired end reads gives singletons file,good file and bad file separately. I got the correct results when I worked on files having header like

@HWI-ST1025:8:1101:1826:1992
TATGCTGAAGAAGACTCCTGTCAACTCGCTGAATGTTTCATTTGTAGCACGTAACTTGTGCTATCTGATGAAGCAC
+
JGIJJJG?GGIJIGIJFGHIJJJJIIJIJIJHHIGGHBHHHGDGGICHHGHFFFDEDEEECDCDDCACDCD:A<AC

now for running the same tool with reads having different annotation format(given below) is giving only singletons file and bad file.

@SRX1797356.1.1 1 length=101
CCCGTTCGTCGTCGACGAGCATGGCACGGCGCGGTATCAGCTTCAACTCAAACTAACTTACTTCCAGAAAGGAGATCGCACCGTATGAAACCTGTCTCTTA
+SRX1797356.1.1 1 length=101
CCCFFFFFFHHFHJJJIIJJJGIIIIJJJGGIEF9??CDEEDCDDDDD<CCCDCD@CDDDDDDDD>::?CCD?BBCC?BB@DD<?BCCCACBDBDCCDDDC
@SRX1797356.2.1 2 length=100
GTGTACTACTCCGGCGACGCCATCACCATGATCGACGATAACCCCGACCTTGCCTGGGTGTTCCCGGAGGAGGGCAGTGTGCTGTCGGTGGACTGCATGG
+SRX1797356.2.1 2 length=100
???AD?DDDDD:DE)<EE@?DD@8BDD@<BCEIDII@D;5A;@@D@@???A>A>AA?79;<<A:>7&05;8>>9><93>:3>>>>:>?&2&2(48>A>A8
@SRX1797356.3.1 3 length=97
AATCAGGGCATACAGCGGGCGGCGGCTGTCACCGATGCTGCGCAGGATTGAGGAACTGAAATCATAAAGCATGATAAATGGCATGCCGAGAAAATAG

Is there any script to change the header annotation like former file reads to get the good ,singleton and bad file separately??

next-gen sequencing alignment • 1.6k views

ADD COMMENT • link 5.7 years ago by priya120195 ▴ 20

0

Entering edit mode

Please tell more details about 1) the fastq files (were they downloaded from SRA? which command? do you have R1 and R2 reads in separate files, or are they interleaved in one file?) 2) how did you run prinseq. Without this information, there is no way to provide help. I will just point out the second fastq file you showed is probably an interleaved fastq file - is the snippet you showed the input or the output of prinseq?.

P. S.: you probably mean header notation, not annotation.

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

1)they were downloaded from SRA by prefetch command of sratool kit. Yes I have R1 and R2 reads in separate files.

2) Command

./prinseq-lite.pl -verbose -fastq file_R1.fastq -fastq2 file_R2.fastq -max_qual_mean 20 -derep 1 -no_qual_header -log file.log

I already have process plenty of fastq files. I know that the issue is only with header line but I am not getting it how to annotate the header and what to write in header.

ADD REPLY • link updated 5.6 years ago by h.mon 35k • written 5.7 years ago by priya120195 ▴ 20

0

Entering edit mode

I don't have a good answer for your question. Did you use fastq-dump -F to retrieve the original fsatq headers? Other than that, I would use some other software to process the reads, like Trimmomatic or BBDuk.

ADD REPLY • link 5.6 years ago by h.mon 35k

0

Entering edit mode

yes I have used fastq-dump.

ADD REPLY • link 5.6 years ago by priya120195 ▴ 20

0

Entering edit mode

But you did not use the -F option to recreate original Illumina fastq headers as suggested by @h.mon?

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

no I didnt use this ,as it was paired end data ,i direct used fastq -dump split files command to get 2 reads

ADD REPLY • link 5.6 years ago by priya120195 ▴ 20

0

Entering edit mode

-F option recovers Illumina read headers in the format that you are familiar with. Unfortunately submitters in this case appear to not have provided the necessary data. You are using prinseq to merge the reads?

ADD REPLY • link 5.6 years ago by GenoMax 147k