Hi All,
I want to remove Mitochondrial genes from Human annotation file (.GTF)
Hi All,
I want to remove Mitochondrial genes from Human annotation file (.GTF)
If the GTF is from Ensembl:
grep -v "^MT" genes.gtf > genes_noMT.gtf
I tried both grep commands on ensemble annotation file release 37.75 (Homo_sapiens.GRCh37.75), and when I used wc -l
to count the #of lines in gtf file, I noticed that: in the original gtf file there are 2828317 Homo_sapiens.GRCh37.75.gtf
but when I used grep -v "^MT"
, I found there is a decreasing of the lines# as shown below;
grep -v "^MT" Homo_sapiens.GRCh37.75.gtf > Homo_sapiens.GRCh37.75_noMT.gtf
wc -l Homo_sapiens.GRCh37.75_noMT.gtf
2828173 Homo_sapiens.GRCh37.75_noMT.gtf
While using grep -v "^ChrM"
, I found that line# in the original file same as when I used grep -v "^ChrM"
grep -v "^ChrM" Homo_sapiens.GRCh37.75.gtf > Homo_sapiens.GRCh37.75_noMT_M.gtf
Could any one explain that.
2828317 Homo_sapiens.GRCh37.75_noMT_M.gtf
If you have Ensembl GTF, use grep -v "^MT"
. We are matching a pattern using grep and removing the lines which has those pattern. Here it depends on how the mitochondrial genes are represented. Ensemble represents them as MT, and other sources represent as ChrM. As you are using ensemble, the pattern '^ChrM'
is not resulting in any matches, hence the number of lines remains same.
Read some tut to understand grep. Here is the one http://rous.mit.edu/index.php/Unix_commands_applied_to_bioinformatics#grep
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
edited thanks, just trying to indicate that
grep -v
would work in this case