Greetings!
Is it possible to ignore the pseudogenes or miRNA during aligning with TopHat?
Thanking you!
Greetings!
Is it possible to ignore the pseudogenes or miRNA during aligning with TopHat?
Thanking you!
I am not sure about the context of your analyses, but emerging evidence suggests that several pseudogenes have role in cancer and miRNA could could act as decoys. So if you are out to understand novel biology from your RNAseq data -- retain them or analyze them for new insights.
See:
Yes, you can give TopHat a GTF/GFF3 file with the genes you want to map the reads to (using -G) and ask it to match the reads only to the genes you provided (otherwise it will search first in the list of genes you provided and then in the rest of the genome) using -T.
you can ignore specific type of biotypes from Tophat : generally i mask only rRNA and mitochondrial genes or r/t RNAs.
Say: Download gtf from ensemble: http://uswest.ensembl.org/info/data/ftp/index.html
script: awk -f get.biotypes.awk Homo_sapiens.GRCh37.71.gtf | sort -u > all.biotypes.txt
BEGIN {OFS=FS="\t"}
(substr($1,1,1)!="#" && substr($1,2,1)!="#") {
#print $9;
split($9,format,";");
i=0;
for (i in format){
if (format[i] ~ /gene_biotype|gene_type/){
sub("gene_biotype ", "", format[i]);
gsub(/"/,"",format[i]);
print format[i];
}
}
}
script2: awk -f get.gtf.mask.biotypes.awk Homo_sapiens.GRCh37.71.gtf > output.gtf
BEGIN {OFS=FS="\t"}
(substr($1,1,1)!="#" && substr($1,2,1)!="#") {
split($9,format,";");
i=0;
for (i in format){
if (format[i] ~ /gene_type|gene_biotype/){
## change to get biotype patterns you want ( ~ ) or you don't want ( !~ ) : (I generally mask Mt and rRNA in RNAseq)
if (format[i] !~ /pseudogene|miRNA/){
# sub("gene_biotype ", "", format[i]);
# gsub(/"/,"",format[i]);
print ;
}
}
}
}
I don't think you want to ignore them. If you have reads that align to those things, you need your aligner to report their correct mapping position. The last thing you want is for the aligner to place those reads in the wrong gene, because you told it not to put them in the right place.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank, very helpful... I will consider it.