<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: A Script to Calculate GC content</title>
	<atom:link href="http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/feed/" rel="self" type="application/rss+xml" />
	<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/</link>
	<description>Genomes, Web 2.0 and Bioethics</description>
	<lastBuildDate>Wed, 23 May 2012 15:51:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Glenn Proctor</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-202</link>
		<dc:creator><![CDATA[Glenn Proctor]]></dc:creator>
		<pubDate>Fri, 09 Apr 2010 08:43:04 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-202</guid>
		<description><![CDATA[The Ensembl Slice.pm module has a method for calculating %GC content (and some other stuff) which uses tr// - given that this is a built-in Perl function I&#039;m pretty sure it&#039;s going to be faster and more memory efficient.  It also uses Keith&#039;s method of counting A,C,G and T and using that as the divisor rather than the length of the sequence.]]></description>
		<content:encoded><![CDATA[<p>The Ensembl Slice.pm module has a method for calculating %GC content (and some other stuff) which uses tr// &#8211; given that this is a built-in Perl function I&#8217;m pretty sure it&#8217;s going to be faster and more memory efficient.  It also uses Keith&#8217;s method of counting A,C,G and T and using that as the divisor rather than the length of the sequence.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nona</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-201</link>
		<dc:creator><![CDATA[Nona]]></dc:creator>
		<pubDate>Thu, 08 Apr 2010 20:12:52 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-201</guid>
		<description><![CDATA[@Anon 

That&#039;s awesome, I didn&#039;t know some many accented characters even existed. Thanks for the tip!]]></description>
		<content:encoded><![CDATA[<p>@Anon </p>
<p>That&#8217;s awesome, I didn&#8217;t know some many accented characters even existed. Thanks for the tip!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Fields</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-200</link>
		<dc:creator><![CDATA[Chris Fields]]></dc:creator>
		<pubDate>Thu, 08 Apr 2010 18:01:46 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-200</guid>
		<description><![CDATA[Transliteration tends to be faster, primarily b/c the transliteration table is built at compile time.  However this limits it (no variable interp, no char sets).]]></description>
		<content:encoded><![CDATA[<p>Transliteration tends to be faster, primarily b/c the transliteration table is built at compile time.  However this limits it (no variable interp, no char sets).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith Bradnam</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-199</link>
		<dc:creator><![CDATA[Keith Bradnam]]></dc:creator>
		<pubDate>Thu, 08 Apr 2010 17:32:34 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-199</guid>
		<description><![CDATA[Dear Anon,

I haven&#039;t shown you my own GC calculating code, but you might be placated in knowing that it does indeed count lower case as well as upper case characters. The sequence data that our lab deals with does often include unknown (N) characters and sometimes other nucleotide ambiguity codes (R for purine etc). In these situations, using the length of the sequence in the GC calculation leads to an incorrect value.

Regards,

Keith]]></description>
		<content:encoded><![CDATA[<p>Dear Anon,</p>
<p>I haven&#8217;t shown you my own GC calculating code, but you might be placated in knowing that it does indeed count lower case as well as upper case characters. The sequence data that our lab deals with does often include unknown (N) characters and sometimes other nucleotide ambiguity codes (R for purine etc). In these situations, using the length of the sequence in the GC calculation leads to an incorrect value.</p>
<p>Regards,</p>
<p>Keith</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anon</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-198</link>
		<dc:creator><![CDATA[Anon]]></dc:creator>
		<pubDate>Wed, 07 Apr 2010 19:25:50 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-198</guid>
		<description><![CDATA[@Keith

It is unfortunate that your script doesn’t also count up the number of lower case a, c, g, and t, or even accented As/Cs/Ts &amp; Gs (ÀÁÂÃÄÅĀĄĂÇĆČĈĊŤŢŦȚĜĞĠĢ) as this would also be “safer”.

Perhaps you could implement some sort of near-key substitution, for example, if ‘q’,&#039;w’,’s’,&#039;x’ or ‘z’ appeared, you could assume they meant ‘a’ as it’s next to it on the keyboard.
]]></description>
		<content:encoded><![CDATA[<p>@Keith</p>
<p>It is unfortunate that your script doesn’t also count up the number of lower case a, c, g, and t, or even accented As/Cs/Ts &amp; Gs (ÀÁÂÃÄÅĀĄĂÇĆČĈĊŤŢŦȚĜĞĠĢ) as this would also be “safer”.</p>
<p>Perhaps you could implement some sort of near-key substitution, for example, if ‘q’,&#8217;w’,’s’,&#8217;x’ or ‘z’ appeared, you could assume they meant ‘a’ as it’s next to it on the keyboard.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Keith Bradnam</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-188</link>
		<dc:creator><![CDATA[Keith Bradnam]]></dc:creator>
		<pubDate>Wed, 17 Mar 2010 20:38:45 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-188</guid>
		<description><![CDATA[From personal experience I would say that using transliteration is much faster than using substitution, but you might not notice the benefits if your sequences are short.

A possible flaw with your code is that you don&#039;t take account that there may be unspecified (&#039;N&#039;) characters in the sequence. I.e. if a 100 nt sequence contains 10 Ns and 50 G+C nucleotides, then your code would calculate the  GC content as 50% which is incorrect (though of course it might be true if you knew what those Ns are).

It is safer to count A, C, G, T separately and then total up their counts to make the effective length for calculating GC%. In my GC subroutine I also count Ns and then &#039;Other&#039; in order to capture any other use of nucleotide ambiguity codes.]]></description>
		<content:encoded><![CDATA[<p>From personal experience I would say that using transliteration is much faster than using substitution, but you might not notice the benefits if your sequences are short.</p>
<p>A possible flaw with your code is that you don&#8217;t take account that there may be unspecified (&#8216;N&#8217;) characters in the sequence. I.e. if a 100 nt sequence contains 10 Ns and 50 G+C nucleotides, then your code would calculate the  GC content as 50% which is incorrect (though of course it might be true if you knew what those Ns are).</p>
<p>It is safer to count A, C, G, T separately and then total up their counts to make the effective length for calculating GC%. In my GC subroutine I also count Ns and then &#8216;Other&#8217; in order to capture any other use of nucleotide ambiguity codes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: manuelcorpas</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-187</link>
		<dc:creator><![CDATA[manuelcorpas]]></dc:creator>
		<pubDate>Mon, 08 Mar 2010 10:53:39 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-187</guid>
		<description><![CDATA[@Mahsa

your line means match the first non-Space six-characters in variable $num and store them in $dec.]]></description>
		<content:encoded><![CDATA[<p>@Mahsa</p>
<p>your line means match the first non-Space six-characters in variable $num and store them in $dec.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mahsa</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-186</link>
		<dc:creator><![CDATA[Mahsa]]></dc:creator>
		<pubDate>Sat, 06 Mar 2010 12:09:54 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-186</guid>
		<description><![CDATA[guys, what does this line mean? 
 my ($dec)=$num =~ /(\S{6})/;

thanks in advance for your help.]]></description>
		<content:encoded><![CDATA[<p>guys, what does this line mean?<br />
 my ($dec)=$num =~ /(\S{6})/;</p>
<p>thanks in advance for your help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Busybody</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-183</link>
		<dc:creator><![CDATA[Busybody]]></dc:creator>
		<pubDate>Thu, 25 Feb 2010 06:53:45 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-183</guid>
		<description><![CDATA[the PERL function substr starts counting at zero

so shouldn&#039;t your script read 

for (my $i = 0;$i&lt;$len-1; $i++) {
   my $base = substr $seq, $i, 1;
 $count++ if $base =~ /[G&#124;C]/i;
 }

instead]]></description>
		<content:encoded><![CDATA[<p>the PERL function substr starts counting at zero</p>
<p>so shouldn&#8217;t your script read </p>
<p>for (my $i = 0;$i&lt;$len-1; $i++) {<br />
   my $base = substr $seq, $i, 1;<br />
 $count++ if $base =~ /[G|C]/i;<br />
 }</p>
<p>instead</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Martijn van Iersel</title>
		<link>http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/comment-page-1/#comment-178</link>
		<dc:creator><![CDATA[Martijn van Iersel]]></dc:creator>
		<pubDate>Sun, 07 Feb 2010 10:10:44 +0000</pubDate>
		<guid isPermaLink="false">http://manuelcorpas.com/?p=406#comment-178</guid>
		<description><![CDATA[I think so. tr goes through the string in a single pass. 

With the substr method you make a copy of every character in the string and operate on that. Making copies is a little extra work for the processor.

But other effects could come into play. You can&#039;t now for sure unless you try it out and measure it.]]></description>
		<content:encoded><![CDATA[<p>I think so. tr goes through the string in a single pass. </p>
<p>With the substr method you make a copy of every character in the string and operate on that. Making copies is a little extra work for the processor.</p>
<p>But other effects could come into play. You can&#8217;t now for sure unless you try it out and measure it.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

