Bowtie alignment
1
0
Entering edit mode
9 months ago
daffodil ▴ 10

I got the flagstat result from my Bam file, while the mapping rate is high, the percentages of properly paired reads and singletons are relatively low, what should I have to do?

112103606 + 0 in total (QC-passed reads + QC-failed reads)
112103606 + 0 primary
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
112103606 + 0 mapped (100.00% : N/A)
112103606 + 0 primary mapped (100.00% : N/A)
112102614 + 0 paired in sequencing
56339631 + 0 read1
55762983 + 0 read2
2720182 + 0 properly paired (2.43% : N/A)
109362326 + 0 with itself and mate mapped
2740288 + 0 singletons (2.44% : N/A)
101228694 + 0 with mate mapped to a different chr
88514230 + 0 with mate mapped to a different chr (mapQ>=5)

EDIT:

I used bcl2fastq with this command and adapter removing

bcl2fastq -R /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV \
    --input-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/BaseCalls/ \
    --output-dir /proj/naiss2023-22-1174/Masomeh \
    --interop-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/InterOp/ \
    --stats-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/BaseCalls/Stats/ \
    --reports-dir /crex/proj/naiss2023-22-1174/Masomeh/BaseCalls/Reports/ \
    --sample-sheet 'ATACsamplesheet.csv' \
-r 4 -p 40 -w 4 \
          --minimum-trimmed-read-length 35 \
          --mask-short-adapter-reads 22 \
          --adapter-stringency 0.9 \
    --ignore-missing-bcls \
    --ignore-missing-filter \
    --ignore-missing-positions \
    --no-lane-splitting

and then I used cutadapt to remove the adapter

cutadapt -a CTGTCTCTTATACACATCT -A CTGTCTCTTATACACATCT -o SPG_rep3_trimmed_R1.fastq -p SPG_rep3_trimmed_R2.fastq  SPG_rep3_S7_R1_001.fastq.gz  SPG_rep3_S7_R2_001.fastq.gz

Is it possible this command affect the reads and cause the reduced properly mapping?

Bowtie2 singletons • 1.5k views
ADD COMMENT
0
Entering edit mode

Did you check the reads quality before alignment? This looks like one of your pairs is failed or has a problem in read labels

ADD REPLY
0
Entering edit mode
9 months ago
Huiyang ▴ 190

Uncertainty regarding the species of your data may be primarily due to the poor quality of the reference genome used.

ADD COMMENT
0
Entering edit mode

yes I have checked by the fastqc .enter link description here

ADD REPLY
0
Entering edit mode

enter image description here

ADD REPLY
0
Entering edit mode

How does this address Huiyang's concern about the reference genome?

ADD REPLY
0
Entering edit mode

You can provide the download link for your reference genome, and I will help you check the quality of the reference genome.

ADD REPLY
0
Entering edit mode

Sorry, you mean that I provide the reference genome (mm10) that I have used for alignment. It on the server and I used it directly from server.

ADD REPLY
0
Entering edit mode

The mm10 genome is free of any issues. Your paired-end data is not genuinely paired, possibly due to errors during the bcl2fastq (bcl2fastq) or during other quality control procedures (cutadapt). You can focus on the labels of your paired reads to check if the read names are identical.

ADD REPLY
0
Entering edit mode

As far as adapter and quality control are concerned, I try to remove them using Fastp. As i mentioned I used this comands for bcl2fastq

bcl2fastq -R /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV \
    --input-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/BaseCalls/ \
    --output-dir /proj/naiss2023-22-1174/Masomeh \
    --interop-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/InterOp/ \
    --stats-dir /proj/naiss2023-22-1174/Masomeh/240131_NB551428_0057_AHT2V3BGXV/Data/Intensities/BaseCalls/Stats/ \
    --reports-dir /crex/proj/naiss2023-22-1174/Masomeh/BaseCalls/Reports/ \
    --sample-sheet 'ATACsamplesheet.csv' \
-r 4 -p 40 -w 4 \
          --minimum-trimmed-read-length 35 \
          --mask-short-adapter-reads 22 \
          --adapter-stringency 0.9 \
    --ignore-missing-bcls \
    --ignore-missing-filter \
    --ignore-missing-positions \
    --no-lane-splitting

should I remove this part ?"

--minimum-trimmed-read-length 35 \
          --mask-short-adapter-reads 22 \
          --adapter-stringency 0.9 \
    --ignore-missing-bcls \
    --ignore-missing-filter \
    --ignore-missing-positions \"
ADD REPLY
1
Entering edit mode

Please run repair.sh script from BBMap ssuite on your paired-end files to verify that the reads are in sync across your files. I don't think bcl2fastq is going to cause this problem.

Guide: https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/repair-guide/

ADD REPLY
0
Entering edit mode

Finally, I've figured it out. All alignment samples increased to 99% when I used fastp instead of cutadapt.

ADD REPLY

Login before adding your answer.

Traffic: 2114 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6