Problem in paired end data due to annotation
0
0
Entering edit mode
5.8 years ago
priya120195 ▴ 20

my raw files look like

@ERX123456.1.1 HWI-ST1018:118:D1RUFACXX:6:1101:1217:2216 length=101
NGGTCAAGAGCGCTTCCACCAACGCACAGCTGGTTGCTGAGGACATCGCTCGTCAGCTGGAGAACCGTGTGACCTTCCGCCGTGCTATGAAGCAGTGCATG
+ERX123456.1.1 HWI-ST1018:118:D1RUFACXX:6:1101:1217:2216 length=101
#0<BFFFFFFFFFIIFFFIIFIIIIIFFIIIIIFFIIIIIIIIIIIIIIIIIFFFFFFFFBFFF<BBFFFFFBBBBFFBFFFFFBBFFFFFFFFFFFFBFF
@ERX123456.2.1 HWI-ST1018:118:D1RUFACXX:6:1101:1642:2219 length=101
NATCACGTCCATAGGATAGTAGTTCATTCTCGCTATTTCAAGCGTTATCTGCTCGACCTCTTCGTCGGAGAAATGCTTGAATACTTCGGACGCATTCTCCG
+ERX123456.2.1 HWI-ST1018:118:D1RUFACXX:6:1101:1642:2219 length=101
#0<BFFFFFFFFFIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIFFFFFFBFFFFFBFFFFFFFFFFFFFFBFFFFFFFB<
@ERX123456.3.1 HWI-ST1018:118:D1RUFACXX:6:1101:1725:2240 length=101
NAGCTTCCGGGAAAAAAAGCGGAAAGCCAAAGAAGCCGGGAAATCTTGCGGCAGATGCACTTTCAGCCAAGGCACATCAGAGTGTGCGCAATGCCGACCAG

When I remove the sample id and length from annotation line the prinseq is working fine for paired end reads. Please suggest me how to write perl script to remove sample id and length from the annotation from entire file.

next-gen sequence • 847 views
ADD COMMENT
2
Entering edit mode

Perl is terrible, I suspect awk '{print $1}' input.fastq > output.fastq would both work and be much faster.

ADD REPLY
0
Entering edit mode

Depends on what priya120195 means by sample ID. If that refers to ERX123456 then this would not work. In that case this should work

$  awk '{if (/^@ER/) {print "@"$2} else if (/^+ER/) {print "+"$2} else {print $1}}' file.fq

@HWI-ST1018:118:D1RUFACXX:6:1101:1217:2216
NGGTCAAGAGCGCTTCCACCAACGCACAGCTGGTTGCTGAGGACATCGCTCGTCAGCTGGAGAACCGTGTGACCTTCCGCCGTGCTATGAAGCAGTGCATG
+HWI-ST1018:118:D1RUFACXX:6:1101:1217:2216
#0<BFFFFFFFFFIIFFFIIFIIIIIFFIIIIIFFIIIIIIIIIIIIIIIIIFFFFFFFFBFFF<BBFFFFFBBBBFFBFFFFFBBFFFFFFFFFFFFBFF
@HWI-ST1018:118:D1RUFACXX:6:1101:1642:2219
NATCACGTCCATAGGATAGTAGTTCATTCTCGCTATTTCAAGCGTTATCTGCTCGACCTCTTCGTCGGAGAAATGCTTGAATACTTCGGACGCATTCTCCG
+HWI-ST1018:118:D1RUFACXX:6:1101:1642:2219
#0<BFFFFFFFFFIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIFFFFFFBFFFFFBFFFFFFFFFFFFFFBFFFFFFFB<
@HWI-ST1018:118:D1RUFACXX:6:1101:1725:2240
NAGCTTCCGGGAAAAAAAGCGGAAAGCCAAAGAAGCCGGGAAATCTTGCGGCAGATGCACTTTCAGCCAAGGCACATCAGAGTGTGCGCAATGCCGACCAG

BTW: This is an interleaved paired-end file. You probably realize that.

ADD REPLY
0
Entering edit mode

Hello priya120195,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6