Hi all,
I have bam files obtained from mapping 10x genomics data. My goal is to get the VCF file. To do the variant calling I want to get rid of the unaligned, multi mapping and other cases. Goal is to get uniquely aligned properly paired reads.
My initial statistics of mapping are (without filtering)
788172985 + 0 in total (QC-passed reads + QC-failed reads)
42854167 + 0 secondary
0 + 0 supplementary
50143481 + 0 duplicates
741131280 + 0 mapped (94.03% : N/A)
745318818 + 0 paired in sequencing
372659409 + 0 read1
372659409 + 0 read2
508751213 + 0 properly paired (68.26% : N/A)
677983578 + 0 with itself and mate mapped
20293535 + 0 singletons (2.72% : N/A)
90544338 + 0 with mate mapped to a different chr
77234446 + 0 with mate mapped to a different chr (mapQ>=5)
To get uniquely aligned read, I am using these filters as suggested by many online platforms and Samtools flagstats.
samtools view -h -q 20 -F 3340 -f 3 -b $path > unique_realigned_Semlo_properlypaired.bam
Results:
420866696 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
420866696 + 0 mapped (100.00% : N/A)
420866696 + 0 paired in sequencing
210157890 + 0 read1
210708806 + 0 read2
420866696 + 0 properly paired (100.00% : N/A)
420866563 + 0 with itself and mate mapped
133 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
Problem: by applying the aforementioned filters, I am getting unique reads but the number of reads in read1 (r1) are not equal to read2(r2). So is there any other way to get uniquely aligned reads (aligned in proper pair)?
Thanks
If this is 10x genomics data is the BAM file generated by
longranger
?And if so, what is the problem with the VCF that it generates?
Hi Igor and genomax,
I hope you can still see this message. I am just going to start using supernova to assembly the genome of geckos. I am having a hard time using it. Can you please suggest articles, websites or anything that can help me. I have installed the software at my cluster and created a script. however, I am stuck here. I have never did this before and I honestly don't know the following steps. Can you please help me? What are the steps that O should follow? what I have to know before starting. Please any help would be greatly appreciated
Thank you so much in advance.
Yes I am using longranger for alignment. Well I am only aligning the reads using -- longranger align parameter so I am not getting any vcf file out of it. But before getting VCF,I want to filter the bam file. Then pre-processing bam file --local realignment using GATK --- variant calling using freebayes.
I am not using --longranger wgs because this parameter is used when you want to use gatk as well along-with the longranger. (this is more suitable for the human genomes where ploidy is diploid).
Long Ranger can use GATK or Freebayes.
Yes it uses both but the parameters are tuned according to diploid genome. In my case I have tetraploid samples.
You mean something is wrong with the 10x genomics sequenced file or with the alignment file?
Sure I will try that.