Entering edit mode
3.3 years ago
McKenna
•
0
I'm trying to understand the variance of the same individual on the same chromosomes tested on Dante 30x and Nebula 100x. Is this expected? Other chromosomes are not as drastic. Which WGS is more "the truth" ?
Are you sure they use the same reference?
The same person, different samples.
@German is asking if the data from two samples is aligned to an identical reference.
hg19 vs hg38
trying to covert the hr19 to hr38 with no luck so far.
CrossMap.py bam -a ./hg19ToHg38.over.chain.gz ./t.bam ./t.hg38
Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/bin/CrossMap.py", line 1744, in <module> crossmap_bam_file(mapping = mapTree, chainfile = chain_file, infile = in_file, outfile_prefix = out_file, chrom_size = targetChromSizes, IS_size=options.insert_size, IS_std=options.insert_size_stdev, fold=options.insert_size_fold,addtag=True) File "/Library/Frameworks/Python.framework/Versions/2.7/bin/CrossMap.py", line 827, in crossmap_bam_file (new_header, name_to_id) = sam_header.bam_header_generator(orig_header = sam_ori_header, chrom_size = chrom_size, prog_name="CrossMap",prog_ver = __version__, format_ver=1.0,sort_type = 'coordinate',co=comments) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/cmmodule/sam_header.py", line 27, in bam_header_generator bamHeaderLine['HD'] = {'VN':format_ver,'SO':sort_type} File "pysam/libcalignmentfile.pyx", line 545, in pysam.libcalignmentfile.AlignmentHeader.__setitem__ TypeError: AlignmentHeader does not support item assignment (use header.to_dict()
It would be best to convert the data back to fastq files and remap it in both cases to a reference you know. Otherwise the comparison you are trying to make may not be useful. There can be subtle differences in how data is mapped/processed and doing a simple crossmap may not be sufficient to do a meaningful comparison.
Thanks I have both original fastq files but they are split into two, do I need to merge them first?
As in they are from two lanes e.g. have
L001
orL002
etc in their names? They can be processed in parallel for each sample and then merged at BAM step. You could alsocat
them together in same order forR1/R2
files before aligning.yes, if these are 2 very different reference genomes - this is what expected
I agree with GenoMax that they'd be better transformed to fastq rather than lifted over
thanks for the help