Question about an alignment
2
0
Entering edit mode
3 days ago
barslmn ★ 2.4k

Hi, Recently we recieved, fastq files and alignments from another group. This is a Clinical Exome Sequencing study, so it has been studied from DNA. It's a pairend Illumina sequencing.

We have another candidate that explains the phenotype fittingly but there is a peciluar view in one of the genes EIF4G1 which raises few questions. The BAM file just says clcsgenomicsgridworker which doesn't give much info about the alignment. I repated the alignment using sarek v3.5.1 but there was no large deletions though coverage peaks looked similar. When I BLAT some of the read sequences manually on UCSC we get the alignments accordance with the CLC with the large gaps. BLAT also shows nothing similar on other genomic regions.

The event looks like a retrotransposon activity. But in order to discover it should we align like RNAseq? Is anyone have experience with the CLC workflow?

Thanks, Barış

BAM header from CLC

@PG     ID:0    VN:21.0 PN:clcgenomicsgridworker

Example reads

>ex1
GGAAGGAATTTCTACCTGAAGGCCAGGACATTGGTGCATTCGTCGCTGAACAGAAGGTGGAGTATACCCTGGGAGAGGAGTCGGAAGCCCCTGGCCAGAGGGCACTCCCCTCC
>ex2
CCCCTGGCCAGAGGGCACTCCCCTCCGAGGAGCTGAACAGGCAGCTGGAGAAGCTGCTGAAGGAGGGCAGCAGTAACCAGCGGGTGTTCGACTGGATAGAGGCCAACCTGAGTGAGCAGCAGATAGTATCCAACACGTTAGTTCGAG
>ex3
GTTAGTTCGAGCCCTCATGACGGCTGTCTGCTATTCTGCAATTATTTTTGAGACTCCCCTCCGAGTGGACGTTGCAGTGCTGAAAGCGCGAGCGAAGCTGCTGCAGAAATACCTGTGTGACGAGCAGAAGGAGCTACAGGCGCTCTACG

The bam file from CLC

enter image description here

Same region processed by sarek

enter image description here

ces sequencing alignment • 553 views
ADD COMMENT
2
Entering edit mode
3 days ago
GenoMax 150k

AFAIK CLC uses a proprietary aligner. So if you are interested in recreating the analysis you probably can't do it without access to the software.

There is a possibility that analysis was run as RNAseq by mistake in CLC? That would make sense since you can recreate the analysis using that assumption. While it may explain this gene, it could lead to odd consequences elsewhere.

At least you have fastq files, so you are in a good spot to do your own analysis.

ADD COMMENT
0
Entering edit mode

There is a possibility that analysis was run as RNAseq by mistake in CLC?

About that, we would have seen this kinda pattern in other genes too, but this is the only one. :/ Maybe CLC aligner uses some hybrid approach?

At least you have fastq files, so you are in a good spot to do your own analysis.

Yes, I am fiddling with bwa gap parameters to get something similar to no success.

ADD REPLY
1
Entering edit mode
2 days ago

I'd look up individual reads and see how for a specific read, how the read is annotated in the CLC bam versus your bam.

ADD COMMENT
0
Entering edit mode

Looking at some examples we can see that bwa soft clips the reads.

BWA alignment


Read name = XXXXX:XXX:XXXXXX:2:1228:23628:28009
Sample = XXXX
Library = XXXX
Read group = XXXX
Read length = 139bp
Flags = 99
----------------------
Mapping = Primary @ MAPQ 60
Reference span = chr3:184,325,298-184,325,373 (+) = 76bp
Cigar = 76M63S
Clipping = Right 63 soft
----------------------
Mate is mapped = yes
Mate start = chr3:184325297 (-)
Insert size = 76
First in pair
Pair orientation = F1R2
----------------------
Supplementary Alignments
chr3:184,325,298-184,325,373 (+) = 77bp @MAPQ 60 NMnull
chr3:184,325,479-184,325,542 (+) = 64bp @MAPQ 60 NM0
----------------------
MC = 76M63S
PG = MarkDuplicates-3561BB13
MQ = 60
AS = 76
XS = 0
Hidden tags: SA, RG Location = chr3:184,325,301
Base = A @ QV 36

CLC alignment

Read name = XXXX:XXXX:XXXX:2:1228:23628:28009_2:N:0:TAGCTATGCA+GAGCCTGACA
Sample = XXXX
Read group = XXXX
Read length = 139bp
Flags = 99
----------------------
Mapping = Primary @ MAPQ 60
Reference span = chr3:184,325,298-184,325,542 (+) = 245bp
Cigar = 75M106D64M
Clipping = None
----------------------
Mate is mapped = yes
Mate start = chr3:184325297 (-)
Insert size = -245
First in pair
Pair orientation = R2F1
----------------------
NH = 1
NM = 106
Hidden tags: MD, RG Location = chr3:184,325,353
Base = G @ QV 37
ADD REPLY
1
Entering edit mode

we can see that bwa soft clips the reads.

But then marks supplementary alignments for the pieces.

Perhaps this is a matter of changing preferences in IGV of how these are displayed?

ADD REPLY
0
Entering edit mode

We are seein them as mismatches, weird enough there are some partially aligned reads on the CLC bam too.

CLC reads

enter image description here

BWA reads

enter image description here

ADD REPLY
0
Entering edit mode

So CLC is making the alignment work by inserting a big gap, and bwa is making the alignment work by throwing away half the read. I'm inclined to believe CLC is correct, and bwa is not.

Maybe try an aligner like STAR, which is more likely to put in a large gap. Then you might be able to replicate CLC's findings. bwa is just not going to cut it here.

Also, you could try making a double reference, one reference sequence with the wt sequence, and add in the sequence with all the gaps as well. My guess is, when you give bwa the right sequence to align to, all those reads will align fine to that with no soft-clipping.

ADD REPLY
0
Entering edit mode

bwa is making the alignment work by throwing away half the read.

We don't know that based on the info above. It looks like out of the 139 bp read there are two alignments reported (77 and 64 bp).

ADD REPLY
0
Entering edit mode

Okay, it's not throwing it away, it's making the supplementary alignment 106 bases away, instead of counting it as a single alignment with a 106 base gap like CLC is doing. CLC is not wrong, those big gaps are legitimate, but bwa just can't make that kind of alignment.

ADD REPLY
0
Entering edit mode

Yes CLC alignments are much better. Maybe there is some incantation of parameters to get that with bwa, but I will leave it at this for now. At least, I learned about a limitation of the method for future reference. Thanks swbarnes2 GenoMax

ADD REPLY

Login before adding your answer.

Traffic: 2597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6