I use BWA-MEM to map reads to reference, the command line is below:
bwa mem -t 6 reference.fasta fq1.fastq fq2.fastq > result.sam
According to the source code, BWA will get read1 from the fq1.fastq, and get read1 from the fq2.fastq. By default, read1 and read2 are paired-end reads from the same DNA fragment. So, I think these two reads will mapped to the same chromosome, because any DNA fragment can't cross the chromosome. However, when I read the sam result of result.sam, there are many paired reads mapped to different chromosome! Why? According to my consideration, it is because of the structure variation or repeat sequence? Or any other reason? If I just want to call SNV and indel, may I remove these paired reads? Thanks in advance!
Biologically it could be explained by translocation, but if you have a lot reads it might be more likely to be caused by something technically.
Hi, other reasons are homologous/pseudo genes, or conserved domains
Best
Tristan
Another leading cause is artifacts in library prep.
Do you mean the read1 from fq1.fastq and read2 from fq2.fastq are not from a same DNA fragment?
You might have fusion genes, like Philadelphia Chromosome. Normally, an aligner will align wherever it finds the best match. The best match will be decided by some sort of final score, the calculation of which could be tuned according to the experimental need by changing the parameters of aligner. However, it is not the task of the aligner to force biological interpretation of the results.
If your reads are not in identical order in R1/R2 files (i.e. if the files were scanned/trimmed individually) then you would get odd mapping like the one you are describing here.
I got fq1.fastq and fq2.fastq from the sequencer directly. By default BWA will get read1 from fq1.fastq and get read2 at the same order from fq2.fastq. Is it possible that these two reads are not exactly from the same DNA fragment? Why?
fusion gene event could cause that. are those cancer samples?
Yes, it's a cancer sample. If it's a normal sample, paired-reads will mapped to different chromosome? I think repeat sequence is frequent..
Are they mapped in proper pairs when you check the flags? If you have a lot of improper pairing, this suggests a technical issue.
Brent Wilson, PhD | Project Scientist | Cofactor Genomics 4044 Clayton Ave. | St. Louis, MO 63110 | tel. 314.531.4647 Catch the latest from Cofactor on our blog.
How many reads aligned to different chromosomes in numbers and proportions to the whole set and to locations where it is present (say, in IGV for a few regions). Please describe your biological sample. What is your reference? What are the mapq scores for these reads?
The sample is from a patient and my reference is hs37d5.fa. Around 2% of paired-reads are mapped to different chromosome, however, most of their MAPQ is 0. So, may I remove these paired-reads? Will they influence the vcf result? My pipeline is bwa->samtools->picard->localrealign->BQSR->mutect2.
Hi , I m working in a cancer panel with 22 genes (something like 90 amplicons ) and i have same result as you and i think you have a bigger target then me. You should looks at the regions with MAPQ 0 an see a low complexity sequences.
Best