Building DNA for Monte Carlo physics simulations


The X-gene, of Marvel Comics’ X-men fame, can be activated by radiation damage. With a good DNA model, Charles Xavier could probably have made a model of the process and saved everyone a whole lot of time…. (Wolverine and the X-Men #4, Marvel Comics)

A large part of my thesis work involves making models of DNA in a computer. The idea is that we make a model of DNA, shoot it with a whole bunch of radiation, and see how many times the DNA breaks. And DNA breaks are worth quantifying as they are one of the origins of cellular damage.

Modelling a base pair of DNA however isn’t the most straightforward of tasks. A whole range of complexities are possible, from modelling the position of each atom in the DNA macromolecule, to modelling a long snake-y cylinder and calling it DNA. For simulations that throw radiation at DNA chains though, the best approach is to consider a modelling approach for which maximises the accuracy of the physical and chemical models we have. Physically, this means for what approximation of DNA can we best evaluate the likelihood an incident particle reacts with the DNA (or in other words, can we model the DNA using something with reasonably accurate cross-sections). Chemically, this asks us to find an approximation of DNA where the constituents of the molecule have well defined reaction rates with water radicals.

Modelling each base pair as six ellipsoids

The best way to model these is by considering the DNA chain as built of three types of molecules, triphosphate, deoxyribose and base pairs (be they adenine, guanine, cytosine or thymine). This is largely because cross-sectional models for physics interactions with these molecules exist and are more useful than the simple atomic cross-sections, because chemical reaction rates (in simulations incorporating water radiolysis) are defined in respect to these molecules. It also simplifies these molecules in their digital representation, each base pair is representable as six ellipsoids.

A single base pair of DNA (AT pair) highlighting the phosphate (yellow circles), deoxyribose (red circles) and base pair (blue circles) parts of the DNA.

Working out the exact ellipsoids that replace the constituent elements of each molecule is a little difficult. Ideally, they should occupy the same volume as the the atoms that compose them (atoms can be considered spheres in this situation, with a radius corresponding to their van der Waal’s radii). Considering the triphosphate group then, there are three oxygen molecules to consider and one phosphate molecule, as below:

A very loose picture of a triphosphate molecule, drawn showing the van der Waal’s radii of the constituent atoms.

To find the volume of the entire molecule, the volumes of each atomic sphere are summed, and then the volumes of the overlapping regions are subtracted from this sum. Finding the intersection volume of two overlapping spheres is reasonably simple. Things get a little more complicated when you consider regions with three overlapping spheres.

The volume of these three overlapping atoms is a little tricky to calculate, it’s the sum of the three spheres’ volumes, minus the three overlaps between the each pair of spheres, plus the volume overlapped by the Nitrogen and the two Carbons.

The total volume occupied by three spheres, A, B and C for example, is then found as follows:

V_t = V_A + V_B + V_C - V_A \cap V_B - V_A \cap V_C - V_B \cap V_C + V_A \cap V_B \cap V_C

The volume of the region overlapped by the three spheres can be calculated easily enough by a Monte Carlo integration. Alternatively, I wrote a small Python utility to do this find the exact solution (, based on this paper. Once the volume of each constituent molecule is found, the dimensions of an ellipsoid with this volume can be found. In the model then, this ellipsoid should be placed at the barycentre of the molecule it replaces.

Dealing with overlapping ellipsoids

A problem that may arise in this approach is that the six ellipsoids may overlap at their edges. Whilst this may be fine in certain simulation platforms, it is a significant problem in Geant4, which I use, as it confuses the particle tracker. There are a couple of ways around this, but the approach I chose is to model the base pair using cut ellipsoids (an approach similar to this is used by Meylan et al.). Essentially, the ellipsoids for each molecule are cut along the z-axis following the following scheme:

Overlaps can be avoided when placing base pairs by orienting ellipsoids to face each other along a principal axis and cutting them along this axis. Grey ellipsoids represent the GATC bases, red represents deoxyribose groups and yellow represents phosphates.

In some cases, it may be necessary to slightly shrink the ellipsoids for the bases so they also do not overlap. These cuts and changes will slightly alter the size of molecules, however the volumes of the regions cut away are typically small enough that the systematic errors they introduce are negligible compared to the other errors that will occur in a simulation.

Implementing this scheme in Geant4, you can build a DNA chain that looks something like this:

A DNA chain in Geant4, built with cut ellipsoids. Red ellipsoids are deoxyribose, yellow ones are triphosphate groups, and the other colours represent the four different base pairs.

The final word

So that’s a rough guide to how DNA can be modelled for a simulation platform such as Geant4. There are other ways things could be done, but I know this one works. Have fun modelling DNA and evaluating damage, or, you know, trying to make Marvel comics a reality….

(Incredible Hulk #632, Marvel comics)




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s