Hi, I have BED file with certain regions of interests that looks like this:
chr1 0 91923
chr1 323234 4596845
..
with the start and end coordinates for each gene for the respective chromosome.
But I want to include maybe the gene name, since I need them for my downstream analyses. I have the GTF file for the whole genome with the annotations (gene names, start/end coordinates).
How can I use this GTF file to add the gene names to the bed file? I thought of using bedtools intersect, but not sure how to do this. Maybe: bedtools intersect -a input.bed -b input.gtf > output.bed
?
But how is bedtools intersect aware of the column specific format of the GTF file? Or might there be also a way to filter the GTF file based on the coordinates in the BED file?
convert GTF to bed before calling bedtools intersect : How To Convert Gencode Gtf Into Bed Format ? ; How to convert gtf to bed format ; Converting gtf format to bed format ; etc... ...
Hi, thanks! Is there also a way to filter the GTF file to include the records based on the BED file coordinates?
You may want to check out
AGAT
toolkit. Here is one promising option (not BED) but may work: https://agat.readthedocs.io/en/latest/tools/agat_sp_filter_feature_from_keep_list.htmlI am already aware of the AGAT toolkit. But the script you mentioned only takes a gff/gtf file and a list of genes as inputs and extracts the overlaps based on the provided list.
But in my case I don't have the gene names, only the coordinates in bed format as mentioned before.
but you can use bedtools intersect with a gtf and a bed