Question

Getting uniquely aligned properly paired reads

0

Entering edit mode

5.4 years ago

nadiabeg.comsats ▴ 10

Hi all,

I have bam files obtained from mapping 10x genomics data. My goal is to get the VCF file. To do the variant calling I want to get rid of the unaligned, multi mapping and other cases. Goal is to get uniquely aligned properly paired reads.

My initial statistics of mapping are (without filtering)

788172985 + 0 in total (QC-passed reads + QC-failed reads)
42854167 + 0 secondary
0 + 0 supplementary
50143481 + 0 duplicates
741131280 + 0 mapped (94.03% : N/A)
745318818 + 0 paired in sequencing
372659409 + 0 read1
372659409 + 0 read2
508751213 + 0 properly paired (68.26% : N/A)
677983578 + 0 with itself and mate mapped
20293535 + 0 singletons (2.72% : N/A)
90544338 + 0 with mate mapped to a different chr
77234446 + 0 with mate mapped to a different chr (mapQ>=5)

To get uniquely aligned read, I am using these filters as suggested by many online platforms and Samtools flagstats.

samtools view -h -q 20 -F 3340 -f 3 -b $path > unique_realigned_Semlo_properlypaired.bam

Results:

420866696 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
420866696 + 0 mapped (100.00% : N/A)
420866696 + 0 paired in sequencing
210157890 + 0 read1
210708806 + 0 read2
420866696 + 0 properly paired (100.00% : N/A)
420866563 + 0 with itself and mate mapped
133 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Problem: by applying the aforementioned filters, I am getting unique reads but the number of reads in read1 (r1) are not equal to read2(r2). So is there any other way to get uniquely aligned reads (aligned in proper pair)?

Thanks

10XGenomics Properly paired unique reads • 3.2k views

ADD COMMENT • link 5.4 years ago by nadiabeg.comsats ▴ 10

0

Entering edit mode

If this is 10x genomics data is the BAM file generated by longranger?

ADD REPLY • link 5.4 years ago by GenoMax 150k

0

Entering edit mode

And if so, what is the problem with the VCF that it generates?

ADD REPLY • link 5.4 years ago by igor 13k

0

Entering edit mode

Hi Igor and genomax,

I hope you can still see this message. I am just going to start using supernova to assembly the genome of geckos. I am having a hard time using it. Can you please suggest articles, websites or anything that can help me. I have installed the software at my cluster and created a script. however, I am stuck here. I have never did this before and I honestly don't know the following steps. Can you please help me? What are the steps that O should follow? what I have to know before starting. Please any help would be greatly appreciated

Thank you so much in advance.

ADD REPLY • link 5.4 years ago by samnioue ▴ 10

0

Entering edit mode

Yes I am using longranger for alignment. Well I am only aligning the reads using -- longranger align parameter so I am not getting any vcf file out of it. But before getting VCF,I want to filter the bam file. Then pre-processing bam file --local realignment using GATK --- variant calling using freebayes.

I am not using --longranger wgs because this parameter is used when you want to use gatk as well along-with the longranger. (this is more suitable for the human genomes where ploidy is diploid).

ADD REPLY • link 5.4 years ago by nadiabeg.comsats ▴ 10

1

Entering edit mode

Long Ranger can use GATK or Freebayes.

ADD REPLY • link 5.4 years ago by igor 13k

0

Entering edit mode

Yes it uses both but the parameters are tuned according to diploid genome. In my case I have tetraploid samples.

ADD REPLY • link 5.4 years ago by nadiabeg.comsats ▴ 10

0

Entering edit mode

You mean something is wrong with the 10x genomics sequenced file or with the alignment file?

Sure I will try that.

ADD REPLY • link 5.4 years ago by nadiabeg.comsats ▴ 10

score 0 · Answer 1 · 2019-11-12

Thats weird... looks like something is broken in your file because:

total reads (420866696) = properly paired (420866696) = Read 1 + Read 2; but Read 1 /= Read2 when there are no secondary/supplementary alignments.
There are also 133 singletons yet you used the samtools -f 3 filter so you should only have paired reads.

I would suggest to try to run samtools fixmate before your samtools view -h -q 20 -F 3340 -f 3 -b and see if that fixes it.