Entering edit mode
10.4 years ago
Varun Gupta
★
1.3k
Hello Everyone,
I have a fastq file and I want to extract only those reads which have length greater than 25 bp. So i want to make another fastq file with read length > 25 bp. How can I do this. This is my top 100 lines of fastq file
@SRR1024131.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1911:1053 length=100
AGGGCAAGTATGAAGAAGTAGAATATT
+SRR1024131.1 DBRHHJN1:259:D0PM7ACXX:1:1101:1911:1053 length=100
DDFHHFHHGGGHGGIFHIJIIDIIJJI
@SRR1024131.2 DBRHHJN1:259:D0PM7ACXX:1:1101:2522:1198 length=100
GGCTCAACTTTCGATGGT
+SRR1024131.2 DBRHHJN1:259:D0PM7ACXX:1:1101:2522:1198 length=100
FFFGHHHHJJJJJGFIJF
@SRR1024131.3 DBRHHJN1:259:D0PM7ACXX:1:1101:3117:1165 length=100
ACATTTTTGAGTGCTTACTACAGT
+SRR1024131.3 DBRHHJN1:259:D0PM7ACXX:1:1101:3117:1165 length=100
FFFHHHHHHHIHEHHFGHFHHGII
@SRR1024131.4 DBRHHJN1:259:D0PM7ACXX:1:1101:3474:1075 length=100
TAGTACTTAGCAAAGAGTGA
+SRR1024131.4 DBRHHJN1:259:D0PM7ACXX:1:1101:3474:1075 length=100
DDDFHDFHIAGHIGHG@33A
@SRR1024131.5 DBRHHJN1:259:D0PM7ACXX:1:1101:3952:1099 length=100
TGAGAACTGAATTCCATAGGCTGT
+SRR1024131.5 DBRHHJN1:259:D0PM7ACXX:1:1101:3952:1099 length=100
EFFHGHHHHJIJJJJJIBFHEHIG
@SRR1024131.9 DBRHHJN1:259:D0PM7ACXX:1:1101:5277:1092 length=100
GCGGCGGCGTTATTCCCATGACCCGCCGG
+SRR1024131.9 DBRHHJN1:259:D0PM7ACXX:1:1101:5277:1092 length=100
FDDDHHDHI@B>=B>?@BD>ACCCBC@BB
@SRR1024131.11 DBRHHJN1:259:D0PM7ACXX:1:1101:6019:1101 length=100
AGTAGATTTGTATGGATTT
+SRR1024131.11 DBRHHJN1:259:D0PM7ACXX:1:1101:6019:1101 length=100
DDDHHFFHIGHAGHEFIII
@SRR1024131.14 DBRHHJN1:259:D0PM7ACXX:1:1101:8423:1248 length=100
AGTCGGTGATGGGAGTCTCT
+SRR1024131.14 DBRHHJN1:259:D0PM7ACXX:1:1101:8423:1248 length=100
FFFHHHFHIJIIJIJBHIJJ
@SRR1024131.15 DBRHHJN1:259:D0PM7ACXX:1:1101:9484:1233 length=100
TGCTGGGTCACACCTGAAGCT
+SRR1024131.15 DBRHHJN1:259:D0PM7ACXX:1:1101:9484:1233 length=100
FFFHHGHFHIJHHHJJHHIJJ
@SRR1024131.16 DBRHHJN1:259:D0PM7ACXX:1:1101:9807:1100 length=100
ACTATTCCAGCGAGAGTTAACATAAATTCCAAT
+SRR1024131.16 DBRHHJN1:259:D0PM7ACXX:1:1101:9807:1100 length=100
FFFHHHHHJJIJJJJIHHGHIJJGJJJJIIJJI
@SRR1024131.17 DBRHHJN1:259:D0PM7ACXX:1:1101:10857:1034 length=100
TAATCATTTTAATTGTACAGTTCAGTAATGT
+SRR1024131.17 DBRHHJN1:259:D0PM7ACXX:1:1101:10857:1034 length=100
B?3CDFBFFFFFIIF:EFHAHIC?FE+ABHH
@SRR1024131.19 DBRHHJN1:259:D0PM7ACXX:1:1101:13257:1082 length=100
ATGTGTTTGTAGGTTGTTTGTTGTCTTTA
+SRR1024131.19 DBRHHJN1:259:D0PM7ACXX:1:1101:13257:1082 length=100
DFFHHHHHJFHHIHHJFGHIJJIFIIIIG
@SRR1024131.20 DBRHHJN1:259:D0PM7ACXX:1:1101:14103:1161 length=100
TGAGGTAGTAGGTTGTATAGTT
+SRR1024131.20 DBRHHJN1:259:D0PM7ACXX:1:1101:14103:1161 length=100
FFEHFCFHFGHGEFHC<HHIED
@SRR1024131.21 DBRHHJN1:259:D0PM7ACXX:1:1101:16005:1093 length=100
TTCTCTCTCTCTGTGTGTGCGTGTGTGTGTGT
+SRR1024131.21 DBRHHJN1:259:D0PM7ACXX:1:1101:16005:1093 length=100
DDFGHGGFJGIJFIBCBAFHHCGGFDCFGFED
@SRR1024131.24 DBRHHJN1:259:D0PM7ACXX:1:1101:17113:1023 length=100
TCCCTGAGACCCTAACTTGTGA
+SRR1024131.24 DBRHHJN1:259:D0PM7ACXX:1:1101:17113:1023 length=100
FFFHHHHHJJJIIJJIJJJJJJ
@SRR1024131.26 DBRHHJN1:259:D0PM7ACXX:1:1101:18596:1025 length=100
TGAGGTAGGAGGTTGTATAGTTAT
+SRR1024131.26 DBRHHJN1:259:D0PM7ACXX:1:1101:18596:1025 length=100
DDDDDACDEEEE:AF3CE@A9ABE
@SRR1024131.27 DBRHHJN1:259:D0PM7ACXX:1:1101:19286:1068 length=100
TCCCTGAGACCCTAACTTGTGA
+SRR1024131.27 DBRHHJN1:259:D0PM7ACXX:1:1101:19286:1068 length=100
DDDFHHHHIIIGG;CEGIEHHG
@SRR1024131.28 DBRHHJN1:259:D0PM7ACXX:1:1101:20016:1230 length=100
CAAATAATTACAGTTAT
+SRR1024131.28 DBRHHJN1:259:D0PM7ACXX:1:1101:20016:1230 length=100
DFFGBFBHG@HGHHGFA
@SRR1024131.29 DBRHHJN1:259:D0PM7ACXX:1:1101:20465:1216 length=100
GTTACGCTCGCCTTGGCCGT
+SRR1024131.29 DBRHHJN1:259:D0PM7ACXX:1:1101:20465:1216 length=100
FFFGHHHHJJJJGGHIFHGD
@SRR1024131.30 DBRHHJN1:259:D0PM7ACXX:1:1101:20573:1152 length=100
AGAAGGAACTTTTACAACTGTGTGGTTTT
+SRR1024131.30 DBRHHJN1:259:D0PM7ACXX:1:1101:20573:1152 length=100
DDBDBB+AFHGE>@<C<?:AA@HEE:)?F
@SRR1024131.32 DBRHHJN1:259:D0PM7ACXX:1:1101:21322:1217 length=100
ATTACTGAAGAAAAGTTTACCT
+SRR1024131.32 DBRHHJN1:259:D0PM7ACXX:1:1101:21322:1217 length=100
AADHHHHB<:EEF;C22A22AC
@SRR1024131.35 DBRHHJN1:259:D0PM7ACXX:1:1101:4318:1259 length=100
AAAAGCATTCATCAGCCCAA
+SRR1024131.35 DBRHHJN1:259:D0PM7ACXX:1:1101:4318:1259 length=100
FFFGHGHHJGCIJFGGIJII
@SRR1024131.36 DBRHHJN1:259:D0PM7ACXX:1:1101:4391:1407 length=100
CTGGACTCTTACTGCGTTTCATACATCT
+SRR1024131.36 DBRHHJN1:259:D0PM7ACXX:1:1101:4391:1407 length=100
FFFH?HHHIGGIGIII<FBEHIIIEIGE
@SRR1024131.39 DBRHHJN1:259:D0PM7ACXX:1:1101:6327:1406 length=100
AAGTACGCACGGCCGGTACAGTGAAG
+SRR1024131.39 DBRHHJN1:259:D0PM7ACXX:1:1101:6327:1406 length=100
FFFHGHHHIJIGIIII0?FHGHIJGH
@SRR1024131.43 DBRHHJN1:259:D0PM7ACXX:1:1101:7579:1334 length=100
TGTGTATAAATGTATTT
+SRR1024131.43 DBRHHJN1:259:D0PM7ACXX:1:1101:7579:1334 length=100
FFFHHHGHJJJJHGIJJ
Any help!!
Regards
Varun
You may find a suitable answer faster by simply searching because a variety of similar questions have been asked before, e.g., Filtering Fastq Sequences Based On Lengths