Question

how to extract intergenic regions from BRAKER output?

0

Entering edit mode

2.7 years ago

mthm ▴ 50

I have used softmaksed genome with the RNAseq bam file to generate gene annotation. the gff3 file contains: CDS, intron, exon, start-codon, stop-codon, mRNA, and gene.

xyz.00052 AUGUSTUS        gene    142183  143365  0.43    -       .       ID=g1;
xyz.00052 AUGUSTUS        mRNA    142183  143365  0.43    -       .       ID=g1.t1;Parent=g1;
xyz.00052 AUGUSTUS        stop_codon      142183  142185  .       -       0       ID=g1.t1.stop1;Parent=g1.t1;
xyz.00052 AUGUSTUS        CDS     142186  142347  1       -       0       ID=g1.t1.CDS1;Parent=g1.t1;
xyz.00052 AUGUSTUS        exon    142186  142347  .       -       .       ID=g1.t1.exon1;Parent=g1.t1;
xyz.00052 AUGUSTUS        intron  142348  142408  1       -       .       ID=g1.t1.intron1;Parent=g1.t1;

I assumed that "gene" covers all the other parts (am not interested in mRNA and CDS is basically the same as exon), so I extracted a bed file containing only gene coordinates.

awk '$4=="gene"' xyz.braker.bed > xyz.braker.gene.bed
xyz.00001 9196    11118   gene    -
xyz.00001 56048   61743   gene    -
xyz.00001 62586   67965   gene    -
xyz.00001 77364   79170   gene    +
xyz.00001 92556   102646  gene    +

then I created a genome file from the "unmasked" genome assembly

samtools faidx  --mark-strand sign xyz.fa
awk -v OFS='\t' {'print $1,$2,$3'} xyz.fa.fai >  xyz.genomefile
xyz.00001 25751438        17
xyz.00002 24213853        25751473
xyz.00003 21310674        49965344
xyz.00004 18560873        71276036
xyz.00005 14821237        89836927

finally I extracted the intergenic regions

bedtools complement -i xyz.braker.gene.bed -g xyz.genomefile > xyz.intergenic.bed

question 1. is this way of extracting intergenic regions correct?

question 2. how can I merge the "xyz.intergenic.bed" (only three columns) to xyz.braker.gene.bed (with four columns including the strand information column)?

intergenic braker annotation • 593 views

ADD COMMENT • link 2.7 years ago by mthm ▴ 50

0

Entering edit mode

intergenic regions do not have strand informations

ADD REPLY • link 2.7 years ago by andres.firrincieli 3.8k

0

Entering edit mode

oh that's right! I updated my post

ADD REPLY • link 2.7 years ago by mthm ▴ 50