Filetering low reads from vcf file/ download exonic region, coding regions of GRCh38 - hg38
1
0
Entering edit mode
21 months ago
minoo ▴ 10

I am analyzing variant calling on RNASeq data using somatic pipeline of mutect2 function in gatk. To fileter the resulting variants, I need to filter out low reads in a way that for example the sum of reads from ref and alt be more than 50. Like here:

0/1:4,21:25:64:615,0,64 4+21<50 so this variants needs to be omitted. Do you have any idea how to do it?

Also I need to remove noncoding variants from this vcf file as it is coming from RNaseq data. I need a bedfile including exonic region and coding regions to be able to filter out noncoding variants. Do you know where can I get such a file? I tried the UCSC website but I could not figure out the position criteria to get such a bed file.

picard gatk vcftools mutect2 • 697 views
ADD COMMENT
0
Entering edit mode
21 months ago

assuming FORMAT/AD[0]+FORMAT/AD[1] == FORMAT/DP

bcftools view -i 'FORMAT/DP >= 50'  in.vcf

I need a bedfile including exonic region and coding regions

get a GTF file, filter for exon/transcript with awk, convert to bed with awk, use bedtools merge followed by bedtools complement to get the BED for the non-coding genome.

ADD COMMENT
0
Entering edit mode

Thanks Pierre, I will try that. Do you think the above threshold can replace the filtering function from picard :

java -jar $picardTool FilterVcf \
 -I input.vcf \
 -O output.vcf \
 --MIN_DP 50

for the second part of my question, how should I do this filtering? I tried using the code below

wget -O - "http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gtf.gz" |\
gunzip -c | grep 'transcript_type "protein_coding"' |\
awk '($3=="exon") {printf("%s\t%s\t%s\n",$1,int($4)-1,$5);}' |\
sort -T . -t $'\t' -k1,1 -k2,2n | bedtools merge > coding.regions.hg38.bed

replacing 'transcript_type "protein_coding"' with 'exon "transcript"' but it didnt work. Do you think the above code will output the coding region wwithout specifying the exon/transcripts?

ADD REPLY

Login before adding your answer.

Traffic: 2343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6