Hi -
I have bulk RNAseq data for 4 different species for which I created 4 separate count.txt files using their respective annotations.gtf files. I then removed all the unnecessary info them to get a simple count matrix as shown below. What I want to do now is merge these 4 count files into one but I am unsure how to do this. The problem is that each of the count files has a different number of rows because the gtf files used to create them have a different number of gene annotations. Ideally, I would want my resulting merged file to have all of the columns from each of the 4 count files but omit any row that do not appear in all 4 files. Any help would be appreciated.
featureCounts -T 5 -p -t exon -g gene_id -a panTro6.ncbiRefSeq.gtf -o panTro6_counts.txt bam1 bam2 bam3 bam4 bam5 bam6 bam7
featureCounts -T 5 -p -t exon -g gene_id -a panPan3.ncbiRefSeq.gtf -o panPan3_counts.txt bam1 bam2 bam3 bam4 bam5 bam6 bam7
featureCounts -T 5 -p -t exon -g gene_id -a ponAbe3.ncbiRefSeq.gtf -o ponAbe3_counts.txt bam1 bam2 bam3 bam4 bam5 bam6 bam7
featureCounts -T 5 -p -t exon -g gene_id -a gorGor6.ncbiRefSeq.gtf -o gorGor6_counts.txt bam1 bam2 bam3 bam4 bam5 bam6 bam7
grep -v "#" panTro6_counts.txt | cut -d$'\t' -f1,7- > panTro6_counts.matrix
grep -v "#" panPan3_counts.txt | cut -d$'\t' -f1,7- > panPan3_counts.matrix
grep -v "#" ponAbe3_counts.txt | cut -d$'\t' -f1,7- > ponAbe3_counts.matrix
grep -v "#" gorGor6_counts.txt | cut -d$'\t' -f1,7- > gorGor6_counts.matrix
`
Is there any list of homology you could use for this merging? Maybe this would be helpful