Entering edit mode
3.9 years ago
FL512
▴
20
Dear all,
I am new to work on WGS/WES analysis. I have VCF files obtained from WGS already and I would like to focus on SNPs located on non-coding regions such as promoter, enhancer, and 5'/3'-UTRs, or if possible, all non-coding regions I would like to analyze.
- Is there any database, reference sequence (txt, bed whatever), or even script available to extract SNPs of non-coding region from my VCF files?
- If anyone has already posted and figured it out, please kindly let me know where the post is.
I do not want to analyze by using biased approach like focusing of the 5k -20kb upstream of genes of interest. Rather, I want to do a global analysis, therefore, I am struggling...
Thank you very much in advance. Any kind of your comments/suggestion should be helpful.
Extract the non-UTR exons in BED format, use
bedtools complement
to get the complement of the entire genome and the exons, that is the non-coding part. Intersect that file with your VCF, these are your non-coding variants,bedtools intersect
is probably what you want.Adding to this answer you can get the non-coding region part from your reference GTF file and then you can use
bedtools intersect
to get your region.Thank you for letting me know. By the way, how am I able to extract non-coding regions from my reference file, which I was struggling for a couple of days. I thought I had to download the non-UTR exons from UCSC genome browser. Anyway, thank you for your help!
Hello, GTF/GFF file has a column called 'type' from which you filter the biotype you need. gff file specification under mRNA you will find the 'intron' attributes. I hope this helps.
Thank you for your help!
UPDATE: for someone like me, this is how I downloaded the region of interests. Bed File With Introns Only
Thank you very much. That was also I was thinking of but I did not know how to do it because of the lacking of knowledge and experience. I will keep you updated. Thank you again.