Question

Human sequence data mapped to mouse genome

0

Entering edit mode

6 months ago

sh • 0

Hello everyone!

Our lab now has a batch of single cell full length sequencing data

FLASH-Seq,
Paired-end, 151bp
Library Protocol: Nextera XT DNA Library Prep Kit Reference Guide (15031942 v03)

After analysing it, I found all cells didn't map to human genome(about 8% mapping rate). But some cells have a higher maping rate(about 30% mapping rate) for mouse genome. Do you have any advice on this please? Thanks!

mapping code,, refer to https://www.protocols.io/view/flash-seq-protocol-kxygxzkrwv8j/v4?step=13.4:

## STAR (v2.7.11a),
##  
## human genome mapping, human genome(GENCODE v45)
for sn in $sample_names;
do
    STAR \
        --runThreadN 10 \
        --limitBAMsortRAM 20000000000 \
        --genomeLoad LoadAndKeep \
        --genomeDir /data/ref/hs/gencode/release_45/star_idx \
        --readFilesIn  /data/project/FPO/data/EN00005765_hdd2/${sn}_trim_*.fastq.gz \
        --readFilesCommand zcat \
        --limitSjdbInsertNsj 2000000 \
        --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
        --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /data/project/FPO/mapping/${sn}_
done

## mouse genome mapping, mouse genome(GENCODE vM33)
for sn in $sample_names;
do

    STAR \
        --runThreadN 10 \
        --limitBAMsortRAM 20000000000 \
        --genomeLoad LoadAndKeep \
        --genomeDir /data/ref/mm/gencode/release_M33/star_idx \
        --readFilesIn  /data/project/FPO/data/EN00005765_hdd2/${sn}_trim_*.fastq.gz \
        --readFilesCommand zcat \
        --limitSjdbInsertNsj 2000000 \
        --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
        --outSAMtype BAM SortedByCoordinate --outFileNamePrefix /data/project/FPO/mm.mapping/${sn}_
done

mapping log, like this:

                                 Started job on |   Jun 19 16:40:29
                             Started mapping on |   Jun 19 16:42:25
                                    Finished on |   Jun 19 16:46:21
       Mapping speed, Million of reads per hour |   32.97

                          Number of input reads |   2161232
                      Average input read length |   250
                                    UNIQUE READS:
                   Uniquely mapped reads number |   70798
                        Uniquely mapped reads % |   3.28%
                          Average mapped length |   222.68
                       Number of splices: Total |   4629
            Number of splices: Annotated (sjdb) |   3761
                       Number of splices: GT/AG |   4542
                       Number of splices: GC/AG |   87
                       Number of splices: AT/AC |   0
               Number of splices: Non-canonical |   0
                      Mismatch rate per base, % |   0.60%
                         Deletion rate per base |   0.05%
                        Deletion average length |   1.32
                        Insertion rate per base |   0.02%
                       Insertion average length |   1.41
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   35608
             % of reads mapped to multiple loci |   1.65%
        Number of reads mapped to too many loci |   1046
             % of reads mapped to too many loci |   0.05%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   2050883
                 % of reads unmapped: too short |   94.89%
                Number of reads unmapped: other |   2897
                     % of reads unmapped: other |   0.13%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

FastQC report:

enter image description here

RNA-seq • 419 views

ADD COMMENT • link 6 months ago by sh • 0

0

Entering edit mode

Isn't this the same basically as RNA-seq bacteria contamination ?

Your samples are not good quality, apparently. Maybe not even usable at all. It does not really matter what they map to. If they're human cells but do not map to it then the sample is lost, simple as that (it sucks to lose an experiment, been there, I feel you). But since you post about this for several months I wonder whether you not simply should move on. It might be that you're not getting anything out of this.

ADD REPLY • link 6 months ago by ATpoint 86k

0

Entering edit mode

Hi. Thanks very much for your reply!

Our lab has three experiments. The first one , all data can map Mouse genome well(70%-80% mapping rate)... That is confusing. Because we do not know why there are lots of mouse DNA.

Thus we make a little experiment. there is a problem, so I post it: RNA-seq bacteria contamination.

And then the post is for third experiment...

ADD REPLY • link 6 months ago by sh • 0