Good alignment rate for DNA-seq data
1
2
Entering edit mode
13 months ago
analyst ▴ 50

Hi scientists,

I have 80 DNA-seq samples. Alignment rate for 70 samples is good higher than 95% but low alignment rate is observed for remaining 10 samples ranging from 4 to 78 percent. I used reference genome for alignment of reads. Anyone please suggest what should be optimal alignment rate for DNA-seq data. Do i need to discard all 10 samples or can keep samples above 70.

Also please confirm that what should be good alignment rate for RNA-seq data. It will be helpful if you share any paper for reference.

Your suggestions will be highly appreciated.

Alignment DNA-seq RNA-seq • 1.6k views
ADD COMMENT
3
Entering edit mode

When this sort of thing happens you should take a selection of reads that do not map and then blast them at NCBI to see if you can get a clue if those samples are contamination with something unexpected.

While alignment rates will depend on the type and quality of sample you should expect alignments of greater than 80% for most samples.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I will do blast.

ADD REPLY
0
Entering edit mode

Thank you all for your valuable comments.

ADD REPLY
3
Entering edit mode
13 months ago
e.r.zakiev ▴ 230

What does the FastQC say about these 10 samples? Did you try it on them before doing any alignments?

My first guess is that these 10 samples may need adapter trimming (see e.g. bbmap suite for that). You should look for at least 80% of mapping rate after trimming, hopefully. If not, then the samples are probably borked and you should discard them, indeed.

ADD COMMENT
2
Entering edit mode

Aligners should "soft-clip" parts of reads that do not align so adapter trimming is not strictly required when aligning to a good reference. In this case there may actually be a problem with the samples themselves.

ADD REPLY
0
Entering edit mode

Is it reliable approach to use trimming tool e.g., Fastp using default options for the samples that are of good quality and no adapter contents?

ADD REPLY
1
Entering edit mode

i am not sure about the Fastp in particular, but I use bbmap's bbduk.sh for my RNAseq analyses on a regular basis, even if FastQC doesn't show any adapter sequences. It never hurts, in my opinion. The mapping rate downstream with Salmon never goes down after adapter trimming.

ADD REPLY
1
Entering edit mode

Yes e.r.zakiev, I performed FastQC analysis initially. I observed comparatively high duplication rate and GC content for these samples. Base quality is good for all samples above 30. However two samples contained Illumina universal adapters and were removed using Fastp (--detect_adapter_for_pe paramater was used).

ADD REPLY

Login before adding your answer.

Traffic: 2500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6