Hi all,
Apologies that I don't have a lot of experience with this stuff, but It's been a long time since I am struggling with this issue.
I am aligning my RNA seq data of a heterologous gene library from nanopore to the reference database, and I am getting such mismatches.
I have verified my reference database, there is no problem with that, I think some of the genes are misaligning with others. I wanted your suggestions on what parameters can I set with the minimap2 aligning command so these mismatches disappear. Currently, my command line looks like this:
minimap2 -ax map-ont reference.fasta rna_seq.fastq > output.sam
After this, I also filter the alignments to keep only primary ones (FLAG 0 & 16) with a mapping quality of at least 10 (MAPQ10).
I would really appreciate your help. Thanks
What "heterologous" library is this? Are there more than one gene/region included? Are you aligning to a combined reference of human genome + this heterologous library? You can't align to the heterologous library alone, if the dataset is from entire genome. What is the aim of this experiment? To show that the heterologous library is being expressed without acquiring any mutations?
Hi,
Thanks for your reply.
I am trying to express ~200 synonymous variants of a non-human gene in human cells to see how codon usage influences gene expression. Synonymous variants encode the same protein but have different nucleotide composition. Thus calling it a heterologous gene library. The library is barcoded, each variant is associated with a unique barcode for identification. After expressing this library in human cells, I am trying to sequence the total mRNA using nanopore MinION. This experiment has nothing to do with human genes, thus I am only aligning the sequencing results with the synonymous variant database. But yes, whole mRNA is sequenced in the experiment and the data produced is a mix of human mRNA + synonymous variants mRNAs.
The aim of the experiment is to see the difference in the mRNA levels of each variant.
I do not think I can provide a proper parameter set of minimap2 to make those mismatches dissapear as it might not be the problem of minimap2. In my opinion, you should do some checks.
Why can't the mismatches be SNVs? Most of them look real to me. Of course, you can't be exactly sure unless you perform a variant calling with the RNA-seq data (its just an analysis that could be done in addition to the usual gene expression analysis,but, of course, not replacing the traditional whole genome sequencing data)
Hi, Thanks for your reply.
I am trying to express ~200 synonymous variants of a non-human gene in human cells to see how codon usage influences gene expression. Synonymous variants encode the same protein but have different nucleotide composition. On each codon first and last nucleotide is variable. I think some other synonymous variants are aligning to this particular variant, but I have so far not checked the sequence of mismatches to see if they are another variants or something else. I will check this.
Do you still think variant calling can be helpful? As I already know the sequences of each variant in library. Many thanks.