How To Find Out The Common Genes From Different Samples ?
1
0
Entering edit mode
11.7 years ago

hello... m working on ngs. i have 4 samples and i got the coding gene information too. all the four samples have around 2800- 2900 genes now i have to look for the common genes found in all the four samples. but i have no idea about it. friends can u help plz ?

dna protein genes • 3.2k views
ADD COMMENT
2
Entering edit mode

What do you mean with "common" genes? Do you mean orthologous genes? If it is so, you can use Orthomcl or Proteinortho to find them. You'll need the fasta files of all the proteins

ADD REPLY
0
Entering edit mode

m having the fasta files and mine is orthologous sequnce ! i ll try thanku...!

ADD REPLY
1
Entering edit mode

Could you please expand a little bit on that?

In your "coding gene information" files, do you have annotation from the same source, so that you only want to compare the names of the genes in all four files? Or do you have roughly similar sequences and want to know which software to use that aligns these sequences and tells you the shared areas?

ADD REPLY
0
Entering edit mode

i am doing the ngs analysis. i have four samples contig file. i got the totla number of coding genes from prodigal software. even i dont know the name of the genes. so how can i find out the common genes in four samples.

ADD REPLY
1
Entering edit mode
11.7 years ago
Rm 8.3k

Asuming files with one gene name per line

cat file1 file2 file3 file4 | sort | uniq -c > gene _sample.counts

gene_sample.couts will contain how many times each gene is present out of four samples.

If you have gene info in bed format then u can use bedtools to get intersection of them.

if inputs are fasta sequence files : u can cd-hit-est to cluster sequences at specific seq identity and with liitle scripting u can get get how many in cluster comming from different samples.... you even try uclust..

ADD COMMENT
0
Entering edit mode

m working on ngs data. i dnt know the name of the genes. i know only the number of genes. this i got from the prodigal server. so how can i found out ?

ADD REPLY
1
Entering edit mode

can you update with few lines of your input files

ADD REPLY
0
Entering edit mode

data generated from next generation sequencing was assembled and m having the contigs which is in the fasta format. by using the prodigal server i calculated the number of genes. but i dont know their names. i wanted to find out the common genes found in all the four samples. each having the 2871,2828,2983,2895 genes.

ADD REPLY
0
Entering edit mode

i have updated the above answer to use fasta files

ADD REPLY
0
Entering edit mode

okay... thnx ! ll try but if it uses the gene name to distingwishes then i cant use because all the gene names are names as gene 1,gene 2, gene 3....... and so on

ADD REPLY

Login before adding your answer.

Traffic: 2189 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6