Dear all,
I'd like to align the WGS data derived from PE50 to the human genome with BWA
or Bowtie2
, but the overall mapping rate was as low as 40%. Below is my command.
nohup bwa mem -t 48 GRCh38.bwa L01_5-8_1.fq.gz L01_5-8_2.fq.gz -o align.pe.sam &
nohup time bowtie2 -k 1 -p 36 -x GRCh38.p12 -1 L01_5-8_1.fq.gz -2 L01_5-8_2.fq.gz -S align.pe.bowtie2.sam &
Then I extracted out the unmapped reads and wonder the reason why they could not be aligned to the reference, so I utilized blastn
to double check what the unmapped reads exactly were. Interesting, the result generated by blastn
revealed all most unmapped reads were from the human reference, indeed. The following is my command of blastn
and part of the result, the 4th column of which is mapped length.
nohup blastn -query unmapped_reads.fa -strand both -db GRCh38 -out out.outfmt6_nt.txt -outfmt "6 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen sstrand covhsp stitle" -num_threads 40 -max_target_seqs 1 -max_hsps 1 -dust no &
CL200085426L1C001R001_1175_2 chr9 100.000 39 0 0 11 49 118071004 118070966 6.85e-12 73.1 49
CL200085426L1C001R001_740_1 chr5 93.878 49 3 0 2 50 49602840 49602888 5.53e-13 76.8 50
CL200085426L1C001R001_2083_1 chr7 100.000 30 0 0 1 30 154413262 154413291 7.20e-07 56.5 50
CL200085426L1C001R001_2722_1 chr1 98.000 50 1 0 1 50 225676657 225676706 7.10e-17 89.8 50
CL200085426L1C001R001_2771_1 chr2 98.000 50 1 0 1 50 74907622 74907573 7.10e-17 89.8 50
CL200085426L1C001R001_2906_1 chrX 97.826 46 1 0 3 48 68388262 68388307 1.09e-14 82.4 48
CL200085426L1C001R001_2906_2 chrY 100.000 38 0 0 1 38 12035127 12035164 2.57e-11 71.3 50
CL200085426L1C001R001_2717_1 chr7 96.000 50 2 0 1 50 19529382 19529431 3.30e-15 84.2 50
Is there anyone have encountered such problem, and how can I handle that? Thanks very much!
Thanks, I'm going to have a try, maybe there is something different.