# Building DNA for Monte Carlo physics simulations

A large part of my thesis work involves making models of DNA in a computer. The idea is that we make a model of DNA, shoot it with a whole bunch of radiation, and see how many times the DNA breaks. And DNA breaks are worth quantifying as they are one of the origins of cellular damage.

Modelling a base pair of DNA however isn’t the most straightforward of tasks. A whole range of complexities are possible, from modelling the position of each atom in the DNA macromolecule, to modelling a long snake-y cylinder and calling it DNA. For simulations that throw radiation at DNA chains though, the best approach is to consider a modelling approach for which maximises the accuracy of the physical and chemical models we have. Physically, this means for what approximation of DNA can we best evaluate the likelihood an incident particle reacts with the DNA (or in other words, can we model the DNA using something with reasonably accurate cross-sections). Chemically, this asks us to find an approximation of DNA where the constituents of the molecule have well defined reaction rates with water radicals.

## Modelling each base pair as six ellipsoids

The best way to model these is by considering the DNA chain as built of three types of molecules, triphosphate, deoxyribose and base pairs (be they adenine, guanine, cytosine or thymine). This is largely because cross-sectional models for physics interactions with these molecules exist and are more useful than the simple atomic cross-sections, because chemical reaction rates (in simulations incorporating water radiolysis) are defined in respect to these molecules. It also simplifies these molecules in their digital representation, each base pair is representable as six ellipsoids.

Working out the exact ellipsoids that replace the constituent elements of each molecule is a little difficult. Ideally, they should occupy the same volume as the the atoms that compose them (atoms can be considered spheres in this situation, with a radius corresponding to their van der Waal’s radii). Considering the triphosphate group then, there are three oxygen molecules to consider and one phosphate molecule, as below:

To find the volume of the entire molecule, the volumes of each atomic sphere are summed, and then the volumes of the overlapping regions are subtracted from this sum. Finding the intersection volume of two overlapping spheres is reasonably simple. Things get a little more complicated when you consider regions with three overlapping spheres.

The total volume occupied by three spheres, A, B and C for example, is then found as follows:

$V_t = V_A + V_B + V_C - V_A \cap V_B - V_A \cap V_C - V_B \cap V_C + V_A \cap V_B \cap V_C$

The volume of the region overlapped by the three spheres can be calculated easily enough by a Monte Carlo integration. Alternatively, I wrote a small Python utility to do this find the exact solution (https://github.com/natl/threespheres), based on this paper. Once the volume of each constituent molecule is found, the dimensions of an ellipsoid with this volume can be found. In the model then, this ellipsoid should be placed at the barycentre of the molecule it replaces.

## Dealing with overlapping ellipsoids

A problem that may arise in this approach is that the six ellipsoids may overlap at their edges. Whilst this may be fine in certain simulation platforms, it is a significant problem in Geant4, which I use, as it confuses the particle tracker. There are a couple of ways around this, but the approach I chose is to model the base pair using cut ellipsoids (an approach similar to this is used by Meylan et al.). Essentially, the ellipsoids for each molecule are cut along the z-axis following the following scheme:

In some cases, it may be necessary to slightly shrink the ellipsoids for the bases so they also do not overlap. These cuts and changes will slightly alter the size of molecules, however the volumes of the regions cut away are typically small enough that the systematic errors they introduce are negligible compared to the other errors that will occur in a simulation.

Implementing this scheme in Geant4, you can build a DNA chain that looks something like this:

## The final word

So that’s a rough guide to how DNA can be modelled for a simulation platform such as Geant4. There are other ways things could be done, but I know this one works. Have fun modelling DNA and evaluating damage, or, you know, trying to make Marvel comics a reality….