How to subset a bed or gtf file to some genes of interest?
2
2
Entering edit mode
9.6 years ago
cyril-cros ▴ 950

Hi,

I have a list of genes of particular interest to me, and some data I obtained from RNASeq / official annotation in GTF format (I can easily convert to the bed format with gtf2bed). They contain an identifier for each gene or transcript.
I would like to do some operations (compare my data to the official annotation for example), but ONLY on these genes of interest. I select them as having the same gene ontology for example.

Is there a parser which allows me to read a gtf (or bed) file, select the lines containing the identifiers I request, and writing down the results to a new gtf (bed) file?

I can code a simple parser, but if a solution exists already (which is very likely..) I am interested. I am coding in R currently, but a Python script would be fine too.

Thanks for your advice

bed gtf gene genome • 12k views
ADD COMMENT
9
Entering edit mode
9.6 years ago
Dan D 7.4k

Can't you just grep the GTF file for those gene/transcript names? You can output the list of names into a file, and then supply that file using the -f parameter, like this:

grep -f my_gene_and_transcript_list.txt genes.gtf > selected_genes.gtf

ADD COMMENT
1
Entering edit mode

You are perfectly correct, I was going for something needlessly complicated. Thanks a lot for your help.

ADD REPLY
0
Entering edit mode

I'm glad it was that easy :)

ADD REPLY
1
Entering edit mode

you might need the capital F flag to indicate exact matches only, or grep could find MAPK inside the line MAPK3.

ADD REPLY
0
Entering edit mode
5.7 years ago
dimitrischat ▴ 210

I got a similar question for this, because grep -f or grep -F doesnt work for me. I got a list of common genes between two cell types. Now i want to pick only the common genes from each condition ( each cell type ) from an excel file that contains

 id    baseMean   baseMeanA baseMeanB   foldChange  log2FoldChange  pval    padj    h6
   XAF1 34.9158621585   0   69.831724317    Inf        Inf        1.86E-15 1.13E-12 up

but 1) what if i want to grep only the (common)genes and some other condition , f.e the 4th one, baseMeanB 2) what if i want to grep all the line containing all conditions that follow each gene ( only the common ones ofcourse)

ADD COMMENT
0
Entering edit mode

You should perhaps make this a new question, but I would recommend awk. I unfortunately don't understand what you want, so perhaps you could give more detail for your questions 1) and 2) in a new post.

ADD REPLY

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6