Convert Long Nanopore reads to Illumina Paired end reads
3
1
Entering edit mode
21 days ago
Mark ▴ 30

Is there a way to convert long, nanopore reads into illumina paired end reads (not actual illumina reads just simulated reads). In the sense that only the ends of the nanopore reads are retained and these are split into paired end files. I need this sort of thing for a niche subworkflow that takes in distant illumina paired end reads but I only have nanopore reads of this section.

illumina sequencing reads nanopore • 318 views
ADD COMMENT
2
Entering edit mode
21 days ago
Michael 55k

I am assuming it is possible to extract sequences of eg.150bp with a distance corresponding to the insert size you want, one of them would need to be reverse complemented. The question is how much that would help. The error profile, quality scores, and adapter sequences would not be similar. If you just want to test some tools it might be better to make an assembly from the long reads, then use a read simulator to simulate Illumina reads.

ADD COMMENT
1
Entering edit mode
21 days ago

I just wrote jvarkit/biostar9608448 for fun. It takes a single-end bam and output a paire-end bam with reads having a length of 'x': https://jvarkit.readthedocs.io/en/latest/Biostar9608448/

$ java -jar dist/jvarkit.jar  biostar9608448 src/test/resources/FAB23716.nanopore.bam -L 10 | samtools view | head
44767a9a-a0b9-4d7e-a324-d0d3ea113d8c_Basecall_Alignment_template    67  chr1    17123   38  6M2D4M  =   31047   13925   GTGCGCCGCT  -2.1.,(')-  MC:Z:10M
44767a9a-a0b9-4d7e-a324-d0d3ea113d8c_Basecall_Alignment_template    147 chr1    31047   38  10M =   17123   -13925  CACCTTGAAC  &#&$$$%$$%  MC:Z:6M2D4M
d324a4bc-aa2c-4ee8-be69-934cc58c0003_Basecall_Alignment_template    67  chr1    38469   1   10M =   43735   5267    ATGCTGCCTG  2-,.314443  MC:Z:10M
d324a4bc-aa2c-4ee8-be69-934cc58c0003_Basecall_Alignment_template    147 chr1    43735   1   10M =   38469   -5267   AGCAAACTTT  -',12()./.  MC:Z:10M
76862e2e-98eb-4ad3-a523-6a8709c0b56a_Basecall_Alignment_template    67  chr1    44403   0   10M =   44481   79  TCAACAACAA  &&&%&)'''&  MC:Z:10M
76862e2e-98eb-4ad3-a523-6a8709c0b56a_Basecall_Alignment_template    147 chr1    44481   0   10M =   44403   -79 GGTAGCCGAA  ''&$%(((%*  MC:Z:10M
3330d9a6-d2a9-423b-accc-92a6d1fe646e_Basecall_Alignment_template    67  chr1    52105   1   8M2D2M  =   53738   1634    ATTCCTACGA  %).,.%+$))  MC:Z:10M
3330d9a6-d2a9-423b-accc-92a6d1fe646e_Basecall_Alignment_template    147 chr1    53738   1   10M =   52105   -1634   ACTTAGGCAA  ,)((%%''((  MC:Z:8M2D2M
c6055e6a-9a1c-4126-84ec-64549fd4d264_Basecall_Alignment_template    67  chr1    63945   5   10M =   67887   3943    TCACCATGAT  *+'*-.111-  MC:Z:10M
c6055e6a-9a1c-4126-84ec-64549fd4d264_Basecall_Alignment_template    147 chr1    67887   5   10M =   63945   -3943   AGTATTATCA  +$+*+/((*&  MC:Z:10M
ADD COMMENT
0
Entering edit mode
21 days ago
GenoMax 149k

Depending on how long your nanopore reads are (think 1 to several kb's) and how many of them, you have you may be able to generate a set of Illumina reads using something like randomreads.sh from BBMap suite. You may need to first convert the reads into fasta format using reformat.sh and then use them as input for the illumina read generation.

As Michael points out this would assume that you are interested in just the sequence and not the error profile of the data you have.

In the sense that only the ends of the nanopore reads are retained and these are split into paired end files.

If that is an absolute requirement you may need to actually write something custom to extract ends.

ADD COMMENT

Login before adding your answer.

Traffic: 8358 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6