Coding with DNA

This post was published 10 years ago. Due to the rapidly evolving world of technology, some concepts may no longer be applicable.

Alright, this one isn’t quite as exciting as the title suggests. I had the need of a quick script for a bit of biology – genetics. Pretty simple stuff really, but I thought I would post it anyway.

Generating random sequences of DNA

function randdna($len){
$length=intval($len);
$bases=array('A','C','G','T');
$dna="";
for ($i=0; $i<$length;$i++){
$dna .=$bases[mt_rand(0,3)];
}
return $dna;
}

Generating an mRNA sequence from a DNA sequence (transcribing)

function transcribe($dna){
$dna=strtoupper($dna);
$dna = preg_replace("/[^ACGT]/", "", $dna);
$rna= strtr($dna,array('A'=>"U",'C'=>"G",'G'=>"C",'T'=>"A"));
return $rna;
}

Generating a protein sequence from an mRNA sequence (transcription)
function translate($rna){
$trans=array("UUU"=>"phe ","UUC"=>"phe ","UUA"=>"leu ","UUG"=>"leu ","CUU"=>"leu ","CUC"=>"leu ","CUA"=>"leu ","CUG"=>"leu ","AUU"=>"ile ","AUC"=>"ile ","AUA"=>"ile ","AUG"=>"met ","GUU"=>"val ","GUC"=>"val ","GUA"=>"val ","GUG"=>"val ","UCU"=>"ser ","UCC"=>"ser ","UCA"=>"ser ","UCG"=>"ser ","CCU"=>"pro ","CCC"=>"pro ","CCA"=>"pro ","CCG"=>"pro ","ACU"=>"thr ","ACC"=>"thr ","ACA"=>"thr ","ACG"=>"thr ","GCU"=>"ala ","GCC"=>"ala ","GCA"=>"ala ","GCG"=>"ala ","UAU"=>"tyr ","UAC"=>"tyr ","UAA"=>"chr ","UAG"=>"mbe ","CAU"=>"his ","CAC"=>"his ","CAA"=>"gln ","CAG"=>"gln ","AAU"=>"asn ","AAC"=>"asn ","AAA"=>"lys ","AAG"=>"lys ","GAU"=>"asp ","GAC"=>"asp ","GAA"=>"glu ","GAG"=>"glu ","UGU"=>"cys ","UGC"=>"cys ","UGA"=>"pal ","UGG"=>"trp ","CGU"=>"arg ","CGC"=>"arg ","CGA"=>"arg ","CGG"=>"arg ","AGU"=>"ser ","AGC"=>"ser ","AGA"=>"arg ","AGG"=>"arg ","GGU"=>"gly ","GGC"=>"gly ","GGA"=>"gly ","GGG"=>"gly ");

$rna=strtoupper($rna);
$rna = preg_replace("/[^ACGU]/", "", $rna);
$start = strpos($rna,"AUG");
$end = strlen($rna);
foreach(array("UAA", "UGA", "UAG") as $bp){
$end1=0;
$found=false;
do{
$end1=strpos($rna,$bp,$start+$end1);
if(($end1-$start)%3==0){$found=true;break;}
}while($end1!==false);
if ($end1!==false && $found && $end1<$end){$end=$end1;}
}
$gene = substr($rna,$start,((intval($end-$start)/3)*3));
echo "Coding Sequence: $gene";

$prot = strtr($gene,$trans);
return $prot;
}

Not quite the most elegant code, but should be fairly accurate – the last function (translate), finds the start codon, determines the reading frame, and locates the first end codon.

Leave a Reply

Your email address will not be published. Required fields are marked *