Nanopore Direct RNA Sequencing - Reference sequence
2
0
Entering edit mode
2.9 years ago
ttt12 ▴ 20

Hi All, I am looking for a way to visualize my RNA reads from Nanopore Direct RNA sequencing on IGV. I sequenced an IVT reaction, and use minimap2 to map the fastq files with my own specific sequence (fasta file). I got the bam files. Next step I would look for the variant transcripts and visualize them on IGV with my own reference sequences. How could I do that? Thank you.

sequences Nanopore IGV RNA Reference • 3.6k views
ADD COMMENT
0
Entering edit mode
2.9 years ago
GenoMax 151k

You will need to sort and index your bam file. Create a custom genome (if you are not using a model organism) in IGV and then load the sorted BAM file.

ADD COMMENT
0
Entering edit mode

Thank you, I did sort and index bam file, but still it did not work :( I loaded on IGV and get this error:

Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&+./:;=?@^_|~-]'

How should I fix it? Thanks a lot.

ADD REPLY
0
Entering edit mode

Did you use the same exact reference file that you used for creating minimap index2 when creating the custom genome in IGV. The reference names need to match (with exception of some model genomes e.g. human, where 1 or chr1 can be used).

ADD REPLY
0
Entering edit mode

Hi, I used EPI2ME Desktop Agent with the option of Fastq Custom Alignment to generate the bam file. I uploaded my reference sequence (fasta file) to that tool. And that exact fasta file was used to upload to IGV to create the custom genome. I am not sure what I did wrong :( If you know, please let me know. Thank you!!

ADD REPLY
0
Entering edit mode
2.9 years ago

Should be fairly easy if you've got this far, sorting and indexing the bam is easy with samtools via bioconda.

Then https://software.broadinstitute.org/software/igv/LoadGenome

Remember to look for transcript annotation which fits your fasta sequence.

I would also use genome wide annotation eg gencode etc and map to the genome too.

ADD COMMENT
0
Entering edit mode

Thank you, I did sort and index bam file, but still it did not work :( I loaded on IGV and get this error:

Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&+./:;=?@^_|~-]'

How should I fix it? Thanks a lot.

ADD REPLY
0
Entering edit mode

Sounds like the BAM header is broken. Check the header using

samtools view -h x.bam | less

Compare to other examples, and or paste the output of this here, it might contain special characters

samtools faidx x.fasta 
ADD REPLY
1
Entering edit mode

samtools view -h x.bam | less

Result:

@HD     VN:1.6  SO:coordinate
@SQ     SN:saRNA_ref\   LN:9383
@RG     ID:none
@PG     PN:minimap2     ID:minimap2     VN:2.17-r941    CL:minimap2 -y -a -x map-ont -t 1 --MD /tmp/datasets/1bcf05961acd0e52b280d1e1a4e3cd8e5cbdc19b/reference_multi_index_8G.fa ./FAK73268_pass_cf5085a8_0-0002.fastq
@PG     ID:samtools     PN:samtools     PP:minimap2     VN:1.10 (pysam) CL:samtools sort --output-fmt BAM -@ 1 -o output/FAK73268_pass_cf5085a8_0-0002.fastq.bam output/FAK73268_pass_cf5085a8_0-0002.fastq.sam

samtools faidx x.fasta
[fai_build_core] different line length in sequence '(null)'.
Could not build fai index /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa.fai

It generated saRNA_ref.fa.fai but may have error, since the message said it could not build fai index. Could you please show me to build this index with a fasta file just containing my sequence of interest. Thank you very much!!

ADD REPLY
0
Entering edit mode

How did you make /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa? Is there more than one sequence in this file?

Can you show us the output of grep "^>" /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT/saRNA_ref.fa?

ADD REPLY
0
Entering edit mode

grep "^" saRNA_ref.fa > output head output

{\rtf1\ansi\ansicpg1252\cocoartf2638
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww51000\viewh27180\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0 >saRNA_ref\
TAATACGACTCACTATAATGGGCGGCGCATGAGAGAAGCCCAGACCAATTACCTACCCAAAATGGAGAAAGTTCACGTTGACATCGAGGAAGACAGCCCATTCCTCAGAGCTTTGCAGCGGAGCTTCCCGCAGTTTGAGGTAGAAGCCAAGCAGGTCACT

Here it is. Is it supposed to be correct? I made the fasta file using TextEdit with > on the first line, the second line is my sequence. Then I saved it as fasta file. Please let me know if I am doing things correctly. Thank you!!

ADD REPLY
1
Entering edit mode

Use a programmers editor (like NotePad++ on PC) or (BBEdit on macOS) and remove all of this stuff at the beginning of the file.

{\rtf1\ansi\ansicpg1252\cocoartf2638
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\margl1440\margr1440\vieww51000\viewh27180\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

\f0\fs24 \cf0

Also remove the trailing \ after reference name here: >saRNA_ref\

You may still need to end up redoing the entire process again with this edited file, if IGV does not like the edited file.

ADD REPLY
0
Entering edit mode

Hi, I did remove all the stuff as you suggested, head output

>saRNA_ref
TAATACGACTCACTATAATGGGCGGCGCATGAGAGAAGCCCAGACCAATTACCTACCCAAAATGGAGAAAGTTCACGTTGACATCGAGGAAGACAGCCCATTCCTCAGAGCTTTGCAGCGGAGCTTC

Loaded on IGV with the new edited fasta file as reference genome, and the sorted bam file, but I still got the error as below

/Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT_aligned/PASS/bam_files/FAK73268_pass_combined.sorted.bam: An error occurred while accessing: /Users/TrinhTat/Documents/Trinh_HM2022/Research/RNA_Core/Nanopore/072122_IVT_aligned/PASS/bam_files/FAK73268_pass_combined.sorted.bam Error loading BAM file: htsjdk.samtools.SAMException: Sequence name 'saRNA_ref\' doesn't match regex: '[0-9A-Za-z!#$%&+./:;?@^_|~-][0-9A-Za-z!#$%&*+./:;=?@^_|~-]*'

Is the problem came from fasta file or bam file? Thank you!!

ADD REPLY
0
Entering edit mode

Problem is in the BAM file.

One could edit the header of the BAM file you have to remove that extra \ that is following the reference name but it may simply be easier to do the process over with the clean reference file you have.

ADD REPLY
1
Entering edit mode

Thank you very much for your help. I am able to visualize it on IGV now :)

ADD REPLY

Login before adding your answer.

Traffic: 2943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6