Hi All,
Could you suggest a way to split a read in a fastq file (on a particular motif) and keep the 2 resulting sequences as 2 independent reads?
I'll give an example of what I want to do:
@K00252:388:H2LM2BBXY:3:1101:1397:1138 1:N:0:ATCACG TGTGACCTTCAGGACAGTCCTAAGGCTGTGGGAAAAACACTNAAAACATGAGTTCAAAAATATATATATATTTTCCCAACTATGCAAAAATATAAGGATGCAATATGGATTGTATAATGAGCTTCACAGATATAAAGGAACAGNGGCAT +
AAAAJJ77<7JJJ7FAJJJJJJJFFFJF< FFF7AFJJJJFA#JFJJFJJJJ< AA-F-< JJFJAJFAAJ< JJJJJ--<<< -FFFF7AJJJJFFJJAFFFFA<<-7< FFJA< JJJJAJF< AAFF7-F< AF-A7A-< -< J-FFJ<f#ajaa<< p="">
Then grep for a sequence. e.g TATATATATA and cut on that string and keep the 2 resulting as 2 reads:
@K00252:388:H2LM2BBXY:3:1101:1397:1138 1:N:0:ATCACG
TGTGACCTTCAGGACAGTCCTAAGGCTGTGGGAAAAACACTNAAAACATGAGTTCAAAAATATATATAT
+
AAAAJJ77<7JJJ7FAJJJJJJJFFFJF< FFF7AFJJJJFA#JFJJFJJJJ< AA-F-< JJFJAJFAAJ< JJJJJ
@K00252:388:H2LM2BBXY:3:1101:1397:1138 1:N:0:ATCACG
TTTTCCCAACTATGCAAAAATATAAGGATGCAATATGGATTGTATAATGAGCTTCACAGATATAAAGGAACAGNGGCAT
+
--<<< -FFFF7AJJJJFFJJAFFFFA<<-7< FFJA< JJJJAJF< AAFF7-F< AF-A7A-< -< J-FFJ< F#AJAA<
Thank you
I'd suggest writing a biopython script for something like that. Do you have any programming experience?
Thank for your answer. I've coded a bit my background is different. What would you suggest? a link to out me on the right track is more than enough.
I'd recommend going through some sections of the Biopython cookbook and tutorial. That would put you on track on how to solve this and further questions about handling common file formats.
While one-liners like the one of Pierre are pretty (and efficient) it would probably take me less time to write it in Python, especially if I have scripts saved from earlier/similar applications which I just have to adapt a bit.