Hello. I'm facing some issues on filtering a GTF file. Every filter I try to apply dont produce any results and I think it is because of this:
Long story short, one of the filters I tried was in R, and the first time I tried to open the GTF file, i got a warning that line 714 did not have XX elemtns, so I added a Fill = True, and this is the result
I guess this formating problem is the reason why i cant filter this file. I have a TCONS list with several ids and neither grep, awk nor the filter in R are working... Here it is the pipes i`m trying:
grep –f result_cpc2_15m_clean.txt /short_path/merged_asm/merged.gtf.class_code_15m | cut -c 1- > outgrep.merged_15m.gtf
awk -F'"' 'FNR==NR {block[$0];next} $2 in block' result_cpc2_15m_clean.txt /media/disk4/gopec/cracco/lncRNA/15m/merged_asm/merged.gtf.class_code_15m > outgrep.merged_15m.gtf
Does anyone now if this is the problem and how can i fix it? Im a newbie and know almost nothing on bioinfo
Thanks in advance
PS: forgot to mention that taking a more careful look, it seems these lines (714, 716, 721) have an extra "contained_in" column, right before the "nearest_ref" column. Also, this file was filtered by classcodes (x, u and j)
grep –f result_cpc2_15m_clean.txt
what's n this filecut -c 1-
why this ?awk -F'"' 'FNR==NR {block[$0];next} $2 in block'
what is that awk script ?it would be easier if you could describe what's your final aim with this GTF ?
Hi!!
result_cpc2_15m_clean.txt
is the file where I have all my ids i want to filter the gtf file for. This GTF file will be used as input on cuffdiff for differential expression analysis.This IDs are lncRNA and i want to test their differential expression between my groups