Entering edit mode
4.3 years ago
yoshifumimiya
▴
50
I am a beginner in RNA-seq.
I am studying because I want to perform RNA-seq of COVID19.
I downloaded the SRA file. And then, the complementary sequence of the genomic sequence was created with biopython. Using it as a reference, indexing and quant were performed with salmon.
Then, the Mapping rate was 0.222043%, which was very low. What is the problem. I am aware of lack of study, but it would be helpful if there were comments.
Which one?
why did not you use the genome and annotation from NCBI?
For the mapping you can just use the normal +RNA genome of SARS-Cov2 without any further manipulation of the reference data. It's not clear from your question, whether you included the human transcripts in your salmon index. This could explain the low mapping rates.
Edit: Or maybe, if the paper used Vero cells (not human) and you used the human transcriptome in the reference, this low mapping rate could also happen.
Thanks for your valuable comments. Like advise, I'll try using the normal +RNA genome of SARS-Cov2.
Thanks for your kind comments.
I downloaded the SRA file from https://trace.ddbj.nig.ac.jp/DRASearch/study?acc=SRP262058. I didn't understand the quote from NCBI. I would like to search from NCBI.
That data is not COVID, it's human cells, check the full description in https://www.ncbi.nlm.nih.gov/gds/?term=SRP262058
The Reference Genome I mean is the https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2
thanks for your comment. I'll try to analyze with that data.
If there are host sequences present the the file (very likely) then results of mapping to SRAS-CoV-2 genome may be very low (0.2% sounds extreme).
Thanks for your kind comment.
You told me the possibility of including a host sequence. I will verify with different SRA data.