How To Compute The Nucleotide Composition Of Large Number Of Sequences (Preferably With Anonline Tool)

0

Entering edit mode

12.1 years ago

mahsa.alemi • 0

I need a server or a software to calculate nucleotide composition of many sequences Simultaneously .can you help me?(except CAIcal server)

Please ask me at once.

Thank you

server • 8.3k views

ADD COMMENT • link updated 22 months ago by Ram 45k • written 12.1 years ago by mahsa.alemi • 0

2

Entering edit mode

when you say nucleotide composition, what do you mean exactly. Plus it is important to understand the scope of the problem. How many sequences do you need to process and how long is the longest?

Solutions could be as simple as writing something like:

s = "ATGCA"
# how many As?
s.count("A")

If your sequences are longer than say a ten million bases or you have tens of millions of sequences to process then it needs to be a little more sophisticated.

ADD REPLY • link 12.1 years ago by Istvan Albert 102k

4

Entering edit mode

12.1 years ago

Pierre Lindenbaum 166k

if the number of sequences is not too big, you can save and open the following HTML page (it uses the HTML5 file-API to load the data):

EDIT:: i've updated my code. It handles multiple FASTA files.

	<html>
	<body>
	<form>
	<label for="ZmlsZXMKfiles">Select a FASTA file</label>:<input type="file" id="ZmlsZXMKfiles" multiple/>
	</form>
	<script type="application/ecmascript">




	function get_counts(dna)
	{
	var counts={};
	for(var i=0;i < dna.length;i++)
	{
	var base=dna[i];
	if(base in counts)
	{
	counts[base]++;
	}
	else
	{
	counts[base]=1;
	}

	}
	return counts;
	}

	function readingFasta(evt)
	{
	if (!evt.lengthComputable) return;
	var loaded = (evt.loaded / evt.total);
	if (loaded > 1) return;
	}

	function endReadFasta(e)
	{
	if(e.target.result==null) return;
	var pre=document.getElementById('ZmlsZXMKtext');
	var lines=e.target.result.split("\n");
	var dna="";
	var i=0;
	var line;
	for(;;)
	{
	if(i==lines.length \|\| (line=lines[i].replace(/^\s+\|\s+$/g,""))[0]=='>')
	{
	if(dna.length > 0)
	{
	var counts=get_counts(dna);
	for(var j in counts)
	{
	pre.appendChild(document.createTextNode(j+"="+counts[j]+" "));
	}
	pre.appendChild(document.createElement("br"));
	}
	if(i===lines.length) break;
	pre.appendChild(document.createTextNode(line+":"));
	dna="";
	}
	else
	{
	dna+=line;
	}
	++i;
	}
	}

	function handleFileSelect(evt)
	{
	var files = evt.target.files; // FileList object
	if(files.length==0) return;

	var pre=document.getElementById('ZmlsZXMKtext');
	while(pre.hasChildNodes()) pre.removeChild( pre.firstChild );
	for(var i=0;i< files.length;++i)
	{
	var reader = new FileReader();
	reader.onprogress=readingFasta;
	reader.onloadend=endReadFasta;
	reader.readAsText(files[i]);
	}
	}

	document.getElementById('ZmlsZXMKfiles').addEventListener('change', handleFileSelect, false);

	</script>
	<pre id="ZmlsZXMKtext"></pre>
	</body>
	</html>

view raw biostar68914.html hosted with ❤ by GitHub

ADD COMMENT • link 10.2 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

That's a neat solution Pierre!

ADD REPLY • link 12.1 years ago by Istvan Albert 102k

2

Entering edit mode

Indeed, I'm even willing to overlook your ridiculous bracket usage in this case and commend you for a very nice solution! :)

ADD REPLY • link 12.1 years ago by Daniel Standage 4.1k

0

Entering edit mode

original post: http://plindenbaum.blogspot.fr/2011/04/playing-with-html5-file-api-translating.html ( dna -> protein )

ADD REPLY • link 12.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I've updated the code (multiple FASTA files)

ADD REPLY • link 12.1 years ago by Pierre Lindenbaum 166k

1

Entering edit mode

12.1 years ago

mlongo ▴ 40

Easy command line tool (faCount) from UCSC (download page for linux- http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/) will give you nucleotide composition of each sequence or can give you a summary of total nt content for the input fasta.

(faSize is another handy one to summarize the number of reads and total number of nt)

ADD COMMENT • link 12.1 years ago by mlongo ▴ 40

0

Entering edit mode

12.1 years ago

qiyunzhu ▴ 430

Simply type command:

grep -o "A" * | wc -l

ADD COMMENT • link 12.1 years ago by qiyunzhu ▴ 430

0

Entering edit mode

12.1 years ago

Manu Prestat 4.1k

You can also use any k-mer counter and set k=1. My own python k-mer counter is here (needs khmer and screed python libraries).

ADD COMMENT • link 12.1 years ago by Manu Prestat 4.1k

Login before adding your answer.