I have single cell RNA seq reads from patient-derived xenograft tumor. I want to see what is the rate of cells with mouse reads. This is my output when I aligned my reads to the human genome:
Started job on | Jun 26 11:25:38 Started mapping on | Jun 26 11:27:02 Finished on | Jun 26 13:02:31 Mapping speed, Million of reads per hour | 137.19
Number of input reads | 218324074
Average input read length | 119
UNIQUE READS:
Uniquely mapped reads number | 137430056
Uniquely mapped reads % | 62.95%
Average mapped length | 92.13
Number of splices: Total | 6342135
Number of splices: Annotated (sjdb) | 5771925
Number of splices: GT/AG | 5988664
Number of splices: GC/AG | 76096
Number of splices: AT/AC | 4963
Number of splices: Non-canonical | 272412
Mismatch rate per base, % | 0.30%
Deletion rate per base | 0.01%
Deletion average length | 1.40
Insertion rate per base | 0.01%
Insertion average length | 1.19
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 15005928
% of reads mapped to multiple loci | 6.87%
Number of reads mapped to too many loci | 311785
% of reads mapped to too many loci | 0.14%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 65565156 % of reads unmapped: too short | 30.03% Number of reads unmapped: other | 11149 % of reads unmapped: other | 0.01% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
These are the results of the alignment to the mouse reference:
Started job on | Jun 26 08:29:55 Started mapping on | Jun 26 08:31:18 Finished on | Jun 26 11:14:04 Mapping speed, Million of reads per hour | 80.48
Number of input reads | 218324074
Average input read length | 119
UNIQUE READS:
Uniquely mapped reads number | 17341041
Uniquely mapped reads % | 7.94%
Average mapped length | 92.97
Number of splices: Total | 1190336
Number of splices: Annotated (sjdb) | 1140791
Number of splices: GT/AG | 1154502
Number of splices: GC/AG | 8153
Number of splices: AT/AC | 1012
Number of splices: Non-canonical | 26669
Mismatch rate per base, % | 0.93%
Deletion rate per base | 0.02%
Deletion average length | 1.46
Insertion rate per base | 0.03%
Insertion average length | 1.16
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 2503443
% of reads mapped to multiple loci | 1.15%
Number of reads mapped to too many loci | 56181
% of reads mapped to too many loci | 0.03%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0 % of reads unmapped: too many mismatches | 0.00% Number of reads unmapped: too short | 198421739 % of reads unmapped: too short | 90.88% Number of reads unmapped: other | 1670 % of reads unmapped: other | 0.00% CHIMERIC READS: Number of chimeric reads | 0 % of chimeric reads | 0.00%
However, I see a lot of recommendations to map the reads to a combined human and mouse reference genome instead. Can somebody explain the difference between mapping separately to each genome ref and to combined one?
- Can I say from the results above that only 8% of reads mapped to the mouse genome?
- Can I use just a bam file that resulted from alignment to human genome for my further analyses?
I am a newbie to bioinformatics, so I would really appreciate any recommendations/links what to read to understand the concepts of alignment and questions I asked above.
thank you!
thank you! I used cellranger with mouse+human combined reference as you advised. The results say that ~9% reads are mapped to mm10 and ~90% of reads are mapped to hg19. Do you know how to filter out those reads that map to mouse then?