The use of a traditional Benford analysis on DNA is troubling for a couple of reasons:
- The DNA quad alphabet does not have any self-evident inherent order of magnitude, so choosing which to designate a lower value from another is problematic with Benford log of scale.
- HexDec of Twomer DNA sequence when converted to Decimal inflates the number of “1” digits invalidating a Benford integrity result.
The solution to both of these concerns is to re-define the Benford approach to DNA. In initial study observations there is a consistent count of significant digits in HexDec (base 16) throughout with SRR006041.fastq with a count every 1,000,000 DNA twomers:
By redefining the Benford method and finding a baseline result to DNA when found consistent to different distinctions of DNA types and DNA disorders, a new Benford approach may be found useful to both the medical industry as well as establishing a baseline to check for consistencies and checksums of DNA samples in both DNA research and criminal DNA evidence for validation purposes. The initial observation of strong consistencies in each twomer sets of DNA in the SRR006041.fastq sample is promising for remodeling Benford approach for DNA forensics analysis.
Here is the start and end of twomer counts for SRR006041.fastq: