Question

Left or right join between 2 BAM files (remove reads the should be filtered out for sex chromosomes)

0

Entering edit mode

2.7 years ago

beausoleilmo ▴ 610

I want to use this reference genome Camarhynchus parvulus but it lacks information about the sex chromosomes (it only has Z chromosome, but not W). So I'm interested in using the reference genome from Taeniopygia guttata (zebra finch) where the Z and W sex chromosomes are present.

# Get the reference genome 
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/902/806/625/GCA_902806625.1_Camarhynchus_parvulus_V1.1/GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/957/565/GCF_003957565.2_bTaeGut1.4.pri/GCF_003957565.2_bTaeGut1.4.pri_genomic.fna.gz

# Index 
samtools faidx GCF_003957565.2_bTaeGut1.4.pri_genomic.fna

# extract 2 sex chromosomes 
samtools faidx GCF_003957565.2_bTaeGut1.4.pri_genomic.fna NC_044241.2 > tae.gut.Z.fna.fa # NC_044241.2 ### Z 
samtools faidx GCF_003957565.2_bTaeGut1.4.pri_genomic.fna NC_045028.1 > tae.gut.W.fna.fa # NC_045028.1 ### W 

# index ref 
bwa index GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz

# Align Z and W car to reference 
bwa mem -t $SLURM_CPUS_PER_TASK \
  GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz \
  tae.gut.Z.fna.fa > Z_aligned.sam

bwa mem -t $SLURM_CPUS_PER_TASK \
  GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz \
  tae.gut.W.fna.fa > W_aligned.sam

# Make BAM fles and filter
cat Z_aligned.sam | samtools view -bSh -q $MQ - > Z_aligned.bam
cat W_aligned.sam | samtools view -bSh -q $MQ - > W_aligned.bam

# Get chromosomes where sex read were mapped from filtered BAM file 
samtools view W_aligned.bam | grep -v @ | cut -f 3 | sort -u > W_chr.parv.gut_filt.txt
samtools view Z_aligned.bam | grep -v @ | cut -f 3 | sort -u > Z_chr.parv.gut_filt.txt

Then I want to use the bam file to 'remove' all the reads that are from the sex chromosomes from my ID bam files.

I tried this:

bedtools intersect  -v \
                    -a BIRD1.bam \
                    -b W_aligned.bam Z_aligned.bam > BIRD1_filtered.bam

or

bedtools bamtobed -i W_aligned.bam > Walg.bed
bedtools bamtobed -i Z_aligned.bam > Zalg.bed
samtools view BIRD1_filtered.bam \
          -b -h \
          -L Walg.bed \
          -U BIRD1_outRegions.bam \
          -o BIRD1_in.Regions.bam

But wasn't able to extract the sex chromosome information. How could this be done. Is this an approach that could work using BAM files or another file type should be used? The end product should be similar to a 'left' or 'right join' between 2 BAM files.

sex chromosomes BAM filtering • 927 views

ADD COMMENT • link 2.7 years ago by beausoleilmo ▴ 610

0

Entering edit mode

Let me confirm if I understand the aim. You are trying to use the sex chromosome fasta from a related organism to remove contigs/scaffolds from an assembly that you are interested in? What kind of alignments did you get from this?

Using bwa mem for this purpose does not seem like a good idea unless you create a fake set of short reads from the two sex chromosomes and then use then for bwa search.

ADD REPLY • link 2.7 years ago by GenoMax 152k

0

Entering edit mode

My samples (like 'BIRD1' in the example above) are aligned using GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz. But the sex chromosomes in this reference are not well described (there is only the Z chromosome, but I'd like to make sure that all the sex chromosomes are found).

So my idea was to use another reference genome (GCF_003957565.2_bTaeGut1.4.pri_genomic.fna.gz) where the sex chromosomes are found and try to match these sex chromosomes to the reference of GCA_902806625.1_Camarhynchus_parvulus_V1.1_genomic.fna.gz. Then 'remove' the sex chromosome information that I would have intersected by doing a 'left join' (so that in the end, each bam file for all my individuals could be filtered in such a way as to remove the sex chromosomes). So the aim is to search within all my individual BAM files and remove the 'sex DNA' so that I can continue my downstream analyses without the 'sex' DNA.

I want to do that because for some reason, my Admixture plots (using NGSAdmix) are partly structured based on sex.

ADD REPLY • link 2.7 years ago by beausoleilmo ▴ 610