Entering edit mode
4.4 years ago
ocortes
•
0
Hi all,
I am using for the first time bbmerge and I do not know exactly how to understand my results. I would be very greatfully if someone can help me. I have paired end reads and I am running two commands to merge the reads in a single fastq file, first repair.sh and then bbmerge.sh.
This is a good merging result or If I understand the output well, I am losing 67% of the reads? Can I do something to improve the merging results?
Thanks in advance
repair.sh
is only needed if your reads were out of sync in R1/R2 files. With merging it is best to start with original files (even before doing any trimming). If you did not do that then I suggest that you runbbmerge.sh
on your original data files.Merging will only work if your reads were designed to do that i.e. length of sequencing was longer than the size of the insert.
Thanks for your quick answer. I run first bbmerge.sh but I got this message:
So I run first repair.sh and with the ouput files I run bbmerge.sh, this makes sense?
Thanks!
This is the reason why I suggested that you should run
bbmerge.sh
on original data files.If your files were out-of-sync then
repair.sh
will indeed fix that part up.Sorry but I am not sure that yuo can see the image inserted, this are the results:
Based on these results you can see that there are ambiguous overlaps where a large % of reads that BBMerge is not able to confidently resolve. You can see that there seems to be a wide range of insert sizes in your data.
What kind of an experiment is this (16S amplicons)?
There are bovine genomic sequences. I have a .bam and .bai file for each chromosome and each file has sequences of 42 animals First I run: picard.jar SamToFastq INPUT=chr6.bam OUTPUT_PER_RG=true RG_TAG=ID OUTPUT_DIR=dir_location VALIDATION_STRINGENCY=LENIENT
and then with the fastq files of each aniaml (animal1_1.fastq; animal1_2_fastq, and so on until 42 aniamls) I run repair.sh and then bbmerge.sh
Thanks!
P.D. Sorry for my questions....perhaps are so dumbs...but I am begining with genome sequence analysis
Not sure why you are trying to merge the reads if these are plain genomic sequences. Merging reads only makes sense when there is a specific reason to do that. For example in 16S amplicon sequencing.
What it is that you want to do with this data? Sounds like data is already aligned since you have SAM files?
SNP calling among the 42 genomic sequences using HaplotypeCaller As I have paired end reads (forward and reverse) I assume that the first step it is to merge both reads in a single one but just this afternoon I have read that it is not neccesary, sorry....
But when I do reference mapping the sequences using bwa mem the sam files that I get are empty:
bwa mem -M CHR6.fa file_1.fastq file_2.fastq > file_2R.sam
SO know I am trying:
bwa aln -I -m 100000 -o 1 -n 0.01 -l 200 -e 12 -d 12 -t 2 CHR6.fa file_1.fastq > file_1.sai
bwa aln -I -m 100000 -o 1 -n 0.01 -l 200 -e 12 -d 12 -t 2 CHR6.fa file_2.fastq > file}_2.sai
bwa sampe CHR6.fa file_1.sai file_2.sai file_1.fastq file_2.fastq > prueba${i}_2R.sam
Thanks for any suggestion!!
If you already have aligned data files why not use them directly for SNP calling?
Thaks for your comments you are helping me a lot. I am new in this kind of analysis and probably. So it is no neccesary to merge both reads and I as I have .bam files I could directly run SNP calling pipielines.
I am runing this pipieline:
picard.jar SortSam
picard.jar MarkDuplicates
picard.jar AddOrReplaceReadGroups
picard.jar BuildBamIndex
GenomeAnalysisTK.jar RealignerTargetCreator
GenomeAnalysisTK.jar IndelRealigner
samtools calmd picard.jar BuildBamIndex
GenomeAnalysisTK.jar HaplotypeCaller
Then I run GenomeAnalysisTK.jar SelectVariants and VariantFiltration for SNPs and INDELS and finally VariantRecalibrator and ApplyRecalibration
Regards!!!!!
Are you following a tutorial? A ot of these steps are unnecessary or outdated.