Hi,
I am having an issue with a sequencing run that when demultiplexed, aligned, and filtered each individual has 1-2 million reads, but these reads are predominantly on one chromosome. For background these are oncorhynchus mykiss and o. clarki samples. The 6th chromosome has an order of magnitude higher reads than the other chromosomes. The libraries were prepared for rad-seq (sbf1) with 19 plates on one novaseq 6000 lane.
The demultiplexing was done using deML. Aligning to the O. mykiss reference genome used bwa and filtering/sorting in samtools.
I can't figure out if the issue is a library prep issue, sequencing issue, or something that I am doing wrong in the splitting and creation of the bam files? If anyone has had a similar problem please let me know.
Thanks, Sam
You should also consider possible biological reasons for this observation, like aneuploidy, repeat sequences, nucleotide composition biases, etc.
I strongly suggest visualizing the read coverage of this chromosome. Is the increase in read coverage uniform or are there specific regions with lots of reads?
Hi jv,
I selected two individuals at random from the plate to look at the depth across chromosome 6. It seems that there is one small region that is accounting for the majority of the reads. The individuals are colored differently. Others have sequenced these species using Rad-seq without this issue so not sure it is biological, but I am new to this. I appreciate your insight.
I mean it's clearly not the whole chromosome it's specific locus. How long is this region? What does the underlying sequence look like?
These regions seem to be ~60 bases long. The sequence for the omy06 region is "ATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT". When I do a blast search it comes up as H. pylori, COVID-19, or synthetic construct. Seems to be pointing toward contamination.
Check my answer. These are adapters and should be trimmed prior to alignment. Sometimes one round of trimming isn't enough (in the case that you already trimmed your reads).
Could you do this plot for other chromosomes as well? The peak might be there, too.