Question

Nanopore Direct RNA Sequencing - Reference sequence

0

Entering edit mode

2.9 years ago

ttt12 ▴ 20

Hi All, I am looking for a way to visualize my RNA reads from Nanopore Direct RNA sequencing on IGV. I sequenced an IVT reaction, and use minimap2 to map the fastq files with my own specific sequence (fasta file). I got the bam files. Next step I would look for the variant transcripts and visualize them on IGV with my own reference sequences. How could I do that? Thank you.

sequences Nanopore IGV RNA Reference • 3.6k views

ADD COMMENT • link 2.9 years ago by ttt12 ▴ 20

score 0 · Answer 1 · 2022-07-28

0

Entering edit mode

2.9 years ago

GenoMax 151k

You will need to sort and index your bam file. Create a custom genome (if you are not using a model organism) in IGV and then load the sorted BAM file.

ADD COMMENT • link 2.9 years ago by GenoMax 151k

0

Entering edit mode

Thank you, I did sort and index bam file, but still it did not work :( I loaded on IGV and get this error:

Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&+./:;=?@^_|~-]'

How should I fix it? Thanks a lot.

ADD REPLY • link 2.9 years ago by ttt12 ▴ 20

0

Entering edit mode

Did you use the same exact reference file that you used for creating minimap index2 when creating the custom genome in IGV. The reference names need to match (with exception of some model genomes e.g. human, where 1 or chr1 can be used).

ADD REPLY • link 2.9 years ago by GenoMax 151k

0

Entering edit mode

Hi, I used EPI2ME Desktop Agent with the option of Fastq Custom Alignment to generate the bam file. I uploaded my reference sequence (fasta file) to that tool. And that exact fasta file was used to upload to IGV to create the custom genome. I am not sure what I did wrong :( If you know, please let me know. Thank you!!

ADD REPLY • link 2.9 years ago by ttt12 ▴ 20

GenoMax · Answer 2 · 2022-07-29

0

Entering edit mode

2.9 years ago

colindaven 7.6k

Should be fairly easy if you've got this far, sorting and indexing the bam is easy with samtools via bioconda.

Then https://software.broadinstitute.org/software/igv/LoadGenome

Remember to look for transcript annotation which fits your fasta sequence.

I would also use genome wide annotation eg gencode etc and map to the genome too.

ADD COMMENT • link 2.9 years ago by colindaven 7.6k

0

Entering edit mode

Thank you, I did sort and index bam file, but still it did not work :( I loaded on IGV and get this error:

Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&+./:;=?@^_|~-]'

How should I fix it? Thanks a lot.

ADD REPLY • link 2.9 years ago by ttt12 ▴ 20

0

Entering edit mode

Sounds like the BAM header is broken. Check the header using

samtools view -h x.bam | less

Compare to other examples, and or paste the output of this here, it might contain special characters

samtools faidx x.fasta

ADD REPLY • link 2.9 years ago by colindaven 7.6k

1

Entering edit mode

samtools view -h x.bam | less

Result:

@HD     VN:1.6  SO:coordinate
@SQ     SN:saRNA_ref\   LN:9383
@RG     ID:none
@PG     PN:minimap2     ID:minimap2     VN:2.17-r941    CL:minimap2 -y -a -x map-ont -t 1 --MD /tmp/datasets/1bcf05961acd0e52b280d1e1a4e3cd8e5cbdc19b/reference_multi_index_8G.fa ./FAK73268_pass_cf5085a8_0-0002.fastq
@PG     ID:samtools     PN:samtools     PP:minimap2     VN:1.10 (pysam) CL:samtools sort --output-fmt BAM -@ 1 -o output/FAK73268_pass_cf5085a8_0-0002.fastq.bam output/FAK73268_pass_cf5085a8_0-0002.fastq.sam

samtools faidx x.fasta
[fai_build_core] different line length in sequence '(null)'.
Could not build fai index /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa.fai

It generated saRNA_ref.fa.fai but may have error, since the message said it could not build fai index. Could you please show me to build this index with a fasta file just containing my sequence of interest. Thank you very much!!

ADD REPLY • link updated 2.9 years ago by GenoMax 151k • written 2.9 years ago by ttt12 ▴ 20

0

Entering edit mode

How did you make /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa? Is there more than one sequence in this file?

Can you show us the output of grep "^>" /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa?

ADD REPLY • link 2.9 years ago by GenoMax 151k

0

Entering edit mode

grep "^" saRNA_ref.fa > output head output

{\rtf1\ansi\ansicpg1252\cocoartf2638
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww51000\viewh27180\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0 >saRNA_ref\
TAATACGACTCACTATAATGGGCGGCGCATGAGAGAAGCCCAGACCAATTACCTACCCAAAATGGAGAAAGTTCACGTTGACATCGAGGAAGACAGCCCATTCCTCAGAGCTTTGCAGCGGAGCTTCCCGCAGTTTGAGGTAGAAGCCAAGCAGGTCACT

Here it is. Is it supposed to be correct? I made the fasta file using TextEdit with > on the first line, the second line is my sequence. Then I saved it as fasta file. Please let me know if I am doing things correctly. Thank you!!

ADD REPLY • link updated 2.9 years ago by GenoMax 151k • written 2.9 years ago by ttt12 ▴ 20

1

Entering edit mode

Use a programmers editor (like NotePad++ on PC) or (BBEdit on macOS) and remove all of this stuff at the beginning of the file.

{\rtf1\ansi\ansicpg1252\cocoartf2638
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww51000\viewh27180\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0

Also remove the trailing \ after reference name here: >saRNA_ref\

You may still need to end up redoing the entire process again with this edited file, if IGV does not like the edited file.

ADD REPLY • link 2.9 years ago by GenoMax 151k

0

Entering edit mode

Hi, I did remove all the stuff as you suggested, head output

>saRNA_ref
TAATACGACTCACTATAATGGGCGGCGCATGAGAGAAGCCCAGACCAATTACCTACCCAAAATGGAGAAAGTTCACGTTGACATCGAGGAAGACAGCCCATTCCTCAGAGCTTTGCAGCGGAGCTTC

Loaded on IGV with the new edited fasta file as reference genome, and the sorted bam file, but I still got the error as below

/Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT_aligned/PASS/bam_files/FAK73268_pass_combined.sorted.bam: An error occurred while accessing: /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT_aligned/PASS/bam_files/FAK73268_pass_combined.sorted.bam Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*'

Is the problem came from fasta file or bam file? Thank you!!

ADD REPLY • link updated 2.9 years ago by GenoMax 151k • written 2.9 years ago by ttt12 ▴ 20

0

Entering edit mode

Problem is in the BAM file.

One could edit the header of the BAM file you have to remove that extra \ that is following the reference name but it may simply be easier to do the process over with the clean reference file you have.