RandomHash 2


  PIP: PIP-0036
  Title: RandomHash2: Enhanced GPU & ASIC Resistant Hash Algorithm
  Type: Protocol
  Impact: Hard-Fork
  Author: Herman Schoenfeld
  Comments-URI: https://discord.gg/sJqcgtD  (channel #pip-0036)
  Status: Active
  Created: 2019-07-31


Summary

A major revision of RandomHash, a GPU and ASIC resistant hashing algorithm, is proposed.

Motivation

Since the fundamental motivations behind RandomHash2 follow that of RandomHash , only the changes and motivations of moving from RandomHash to RandomHash2 are discussed in this document. The reader is advised to familiarize themselves with RandomHash  for full context.

In short, the purpose of RandomHash2 is to address the the need for faster block-header validation currently afflicting PascalCoin due to the predecessor RandomHash PoW algorithm.

Background

Whilst RandomHash appears to have empirically achieved it's GPU and ASIC resistivity goals, it's computationally-heavy nature has resulted in an unforeseen consequence of slow blockchain validation.

First of all, RandomHash introduces the concept of nonce-dependencies between themselves. This means in order to evaluate the hash for a block-header with some nonce, the partial evaluation of the same header with random "neighbouring nonces" is required. This allows RandomHash to operate in two modes, validation mode and mining mode.

In validation mode, RandomHash is simply used to hash a block-header in order to verify it's position in a blockchain -- just the same as how SHA2-256D does in Bitcoin.

In mining mode, RandomHash is used to mine the next block-header by enumerating many nonces to find an acceptable Proof-of-Work solution for the next block. In this mode, RandomHash exploits the partial calculations from previous rounds by resuming them in subsequent hashing rounds. In RandomHash 1, this mode operates at twice the hashrate as validation mode (200%), and is the basis for the "CPU Bias" and important to achieve GPU resistivity. It is biased towards CPU's since re-using the partially calculated nonces is a serial optimization that cannot be efficiently exploited in a parallelized, batch-computation context (such as GPU mining). In other words, the optimal nonce-set is enumerated on-the-fly and cannot be pre-determined into a range for parallel mining as GPU mining requires.

However, on a typical 2019 desktop computer, the validation hashrate of RandomHash is approximately 20 hashes per second. At 288 blocks per day, that's 14 seconds to validate a days worth of blocks and over 1 hour to validate 1 year of blocks. Whilst multi-threading, performance tuning and other optimizations have significantly optimized this performance oversight, it remains a fundamental a long-term issue that needs to be resolved.

RandomHash2 offers an order of magnitude improvement in both validation and mining hashrate. In RandomHash2, the same machine validates at ~400 hashes per second yet mines at ~2,000 hashes per second with far less memory-hardness. Whilst not empirically tested against GPU performance, these numbers suggest a 500% CPU bias.

RandomHash vs RandomHash2 Measured Performance

Algorithm Mean Hashrate (H/s) Mean Mem Per Hash (b) Min Mem Per Hash (b) Max Mem Per Hash (b) Sample Std Dev
RandomHash (validation) 23 5,018,876 5,018,628 5,019,116 83.86
RandomHash (mining) 44 7,528,288 7,527,872 7,528,732 136
RandomHash2 (validation) 401 16,361 560 73,872 14,934
RandomHash2 (mining) 2,109 74,642 1,048 157,944 15,460

Machine: AMD FX-8150 8 Core 3.60 Ghz utilizing 1 thread

ℹ️ RandomHash2 (validation) is the mode used to validate blocks during load and sync whereas RandomHash2 (mining) is the mode the mining network will use on V5 activation.

Specification

RandomHash2 is similarly structured to RandomHash, but has some key differences. The below overview outlines the full algorithm.

Overview

  1. Hashing a nonce requires N iterations (called levels), which are evaluated recursively;
  2. N varies per nonce in a non-deterministic manner between MIN_N=2 and MAX_N=4, inclusive;
  3. Each level in (1) also depends on J neighbouring nonces, determined randomly and non-deterministically;
  4. The value J is restricted to MIN_J=0 and MAX_J=4;
  5. Each level selects a random hash function from a set of 77 well-known cryptographic hash algorithms;
  6. The input digest hashed at a level is the compression of the transitive closure of the hash outputs of all it's prior-levels (1) and it's neighbouring nonce's prior-levels (3);
  7. The output of every level is expanded for (low) memory-hardness;
  8. Randomness is generated using Mersenne Twister algorithm;
  9. Randomness is always seeded using last DWORD of a cryptographic hash algorithm output;
  10. The first input is pre-hashed as using SHA2-256;
  11. The last output is post-hashed using SHA2-256;

Differences to RandomHash 1

The primary differences to predecessor are:

Constants


    MIN_N = 2;
    MAX_N = 4;
    MIN_J = 1;
    MAX_J = 8;
    M = 256;

N is now random per nonce


    Result := (ARound = MAX_N) OR ((ARound >= MIN_N) AND (GetLastDWordLE(LOutput) MOD MAX_N = 0));

Block Header is pre-hashed before used (last DWORD is seed)


    if ARound = 1 then begin
        LRoundInput := FHashAlg[0].ComputeBytes(ABlockHeader).GetBytes;
        LSeed := GetLastDWordLE( LRoundInput );
        LGen.Initialize(LSeed);   
        ...

Random number of dependent neighbour nonces


    LNumNeighbours := (LGen.NextUInt32 MOD (MAX_J - MIN_J)) + MIN_J;
    for i := 1 to LNumNeighbours do begin

MurMur3 checksumming removed completely*


   // MurMur3 checksumming has been removed completely and seeding now uses last dword of a cryptographic hash
   LSeed := GetLastDWordLE( LRoundInput );
  ...
   // Compression & Expansion no longer use MurMur3 checksum for seeding mersenne twister, instead seed is passed as argument
   function Expand(const AInput: TBytes; AExpansionFactor: Int32; ASeed : UInt32) : TBytes;
   function Compress(const AInputs: TArray<TBytes>; ASeed : UInt32): TBytes; inline;

Neighbouring nonces are only cached when fully evaluated


LNeighbourNonceHeader := SetLastDWordLE(ABlockHeader, LGen.NextUInt32); // change nonce
LNeighbourWasLastRound := CalculateRoundOutputs(LNeighbourNonceHeader, ARound - 1, LNeighborOutputs);
LRoundOutputs.AddRange(LNeighborOutputs);     

if LNeighbourWasLastRound then begin
    LCachedHash.Nonce := GetLastDWordLE(LNeighbourNonceHeader);
    LCachedHash.Header := LNeighbourNonceHeader;
    LCachedHash.Hash := ComputeVeneerRound(LNeighborOutputs);
    // if header is different (other than nonce), clear cache
    if NOT BytesEqual(FCachedHeaderTemplate, LCachedHash.Header, 0, 32 - 4) then begin
        FCachedHashes.Clear;
        FCachedHeaderTemplate := SetLastDWordLE(LCachedHash.Header, 0);
    end;
    FCachedHashes.Add(LCachedHash);
end;

There are now 77 internally used hash algorithms, not 18

In order to strengthen ASIC and GPU resistance, RandomHash2 now employs 77 internal hash algorithms to transform the data. This moves the economic costs of an ASIC far beyond viability for any rational economic actor, and since the sequence in which these 77 hash algorithms are used is in-determinable and random on a nonce-by-nonce basis. This number is also such that it makes GPU-programs (kernels) similarly non-viable, although not to the same degree as an ASIC.

The below are the algorithms in their indices as they appear within RandomHash2 algorithm.

Blake2B_160, Blake2B_256, Blake2B_512, Blake2B_384, Blake2S_128, Blake2S_160, Blake2S_224, Blake2S_256, Gost, GOST3411_2012_256, GOST3411_2012_512, Grindahl256, Grindahl512, HAS160, Haval_3_128, Haval_3_160, Haval_3_192, Haval_3_224, Haval_3_256, Haval_4_128, Haval_4_160, Haval_4_192, Haval_4_224, Haval_4_256, Haval_5_128, Haval_5_160, Haval_5_192, Haval_5_224, Haval_5_256, Keccak_224, Keccak_256, Keccak_288, Keccak_384, Keccak_512, MD2, MD5, MD4, Panama, RadioGatun32, RIPEMD, RIPEMD128, RIPEMD160, RIPEMD256, RIPEMD320, SHA0, SHA1, SHA2_224, SHA2_256, SHA2_384, SHA2_512, SHA2_512_224, SHA2_512_256, SHA3_224, SHA3_256, SHA3_384, SHA3_512, Snefru_8_128, Snefru_8_256, Tiger_3_128, Tiger_3_160, Tiger_3_192, Tiger_4_128, Tiger_4_160, Tiger_4_192, Tiger_5_128, Tiger_5_160, Tiger_5_192, Tiger2_3_128, Tiger2_3_160, Tiger2_3_192, Tiger2_4_128, Tiger2_4_160, Tiger2_4_192, Tiger2_5_128, Tiger2_5_160, Tiger2_5_192, WhirlPool .

Analysis

Cryptographic Strength

Since hashing starts and ends with a SHA2-256 the cryptographic strength of RandomHash2 is at least that of SHA2-256D as used in Bitcoin. Even if one were to assume the data transformations in between the start/end were cryptographically insecure, it wouldn't change this minimum security guarantee.

However, the transformations in between are not cryptographically weak and involve the use of 77 other cryptographically strong hash algorithms. As a result, RandomHash2 is orders of magnitude stronger than any single cryptographic hash algorithm since it fundamentally is composed of a random combination of many cryptographic hash algorithms. It should be noted that this achievement is paid for by significant performance overhead, which is intentional since the purpose of this algorithm if Proof-of-Work more than cryprographic obfuscaion.

One criticism levelled against RandomHash2 is that within the set of 77 hash algorithms used, some are considered "cryptographically weak", such as MD5. These criticisms are misguided since the use of some weak algorithms is inconsequential to the cryptographic security since their purpose is only to add computational complexity to prevent ASIC implementations, not to complement security.

In order to get a grasp of the minmum security provided by RandomHash2, consider the high-level algorithmic structure as essentially a recurrence of nested hashes as follows:


RandomHash2(Data) = SHA2_256( H1( H2( .... H_N( SHA2_256( DATA ) ) ...) )
where
   H_i = a randomly selected hash function based on the output of H_(i-1)   
   N = a random number determined by the nonce and neighbouring nonces (indeterminable but bound)

It follows that the weakest possible RandomHash2 for some WeakestDigest would comprise of 2 levels of evaluation (MIN_N=2) with each of those 2 levels having 1 neighbouring nonce dependency (MIN_J=1). Also, we assume the hash algorithms used at all levels were MD5, as it is considered weakest of the set of 18 possible algorithms. In this case,


RandomHash2(WeakestDigest) = SHA2_256( MD5 ( MD5 ( MD5 ( MD5 ( SHA2_256( WeakestDigest ) ) ) ) ) )

Clearly the above shows the weakest possible RandomHash2 is still far stronger than a SHA2-256D as typically employed in Bitcoin-based cryptocurrencies.


SHA2-256D(WeakestDigest) = SHA2-256 ( SHA2-256 ( WeakestDigest ) )

In addition to the above, RandomHash2 internally transforms data using expansions and compressions which are themselves cryptographically secure. As a result, it's clear that RandomHash2's cryptographic strength is at least as strong as Bitcoin's SHA2-256D with the likelihood of also being orders of magnitude stronger.

Nonce Scanning Attack

In RandomHash2, the number of levels N required to mine a nonce is now random and varies per nonce in a non-deterministic manner. The randomization of N introduces new level of randomness and executive decision-making into the core algorithm in order to enhance GPU and ASIC resistivity. However, it introduces a new attack vector called "Nonce Scanning Attack". In this attack, a miner can implement a simplified miner that only tries to mine "simple nonces" that require few levels to evaluate whilst rejecting "complex nonces" that require more levels to evaluate. By reducing the number of computations required to evaluate a nonce and simplifying the algorithm implementation, a higher hashrate could be achieved and an ASIC implementation made viable.

To thwart this attack, RandomHash2 restricts the range of values N can take to be between MIN_N = 2 and MAX_N = 4, inclusive.

By forcing a nonce evaluation to have at least 2 levels of computation, the miner necessarily requires the full algorithm implementation which prevents simplified ASIC miners. Also, since each nonce requires at least 2 levels of evaluation, and each of those levels is likely to depend other nonces to 1 level each, the number of computations saved by nonce-scanning must be balanced by the number of pre-computed cached nonces a miner would get if they were honestly mining without nonce-scanning (due to higher number of dependent neighboring nonces).

In order to determine if this balance is achieved, an empirical nonce-scanning attack was conducted. The below table shows empirical results from nonce-scanning N=MIN_N to N=MAX_N.

Unsupported Object

{
  "type": "child_database",
  "child_database": {
    "title": ". (1)"
  },
  "object": "block",
  "id": "bd840a0f-a021-4cb1-8ca5-e71143987a16",
  "created_time": "2022-12-03T22:57:00Z",
  "last_edited_time": "2022-12-03T22:57:00Z",
  "has_children": false,
  "created_by": {
    "id": "041aeccd-4487-458d-bd0a-b394868bbf29",
    "object": "user"
  },
  "last_edited_by": {
    "id": "041aeccd-4487-458d-bd0a-b394868bbf29",
    "object": "user"
  },
  "parent": {
    "Type": "page_id",
    "page_id": "10c7bd48-3437-4e1a-a0e2-cb8b53746c1f"
  }
}

    

Machine: AMD FX-8150 8 Core 3.60 Ghz utilizing 1 thread

As the above table shows, this balance is achieved. Nonce-scanning (via CPU) yields no observable benefit and in fact incurs a hashrate penalty. Also, it is the opinion of the author that any future optimization would not change this balance since an optimization would benefit all nonce scanning levels proportionally. However, a worthy line of inquiry is whether or not the reduced memory-hardness of nonce-scanning at lower levels may yield some optimization benefit for GPU-based miner. In any event, the result of this attack is only to gain higher hashrate and does not compromise the cryptographic security of the blockchain whatsoever.

CPU Bias

The RandomHash2 algorithm, like it's predecessor, is inherently biased towards CPU mining due to it's highly serial nature, use of non-deterministic recursion and executive-decision making. In addition, RandomHash2 can now evaluate many nonces when evaluating one, allowing CPU miners to enumerate the optimal nonce-set on the fly. Testing shows a 300% - 400% advantage for serial mining over batch mining, which indicates a proportional CPU bias.

Memory Complexity

RandomHash is memory-light in order to support low-end hardware. A CPU will only need 300KB of memory to verify a hash. Unlike RandomHash, mining does not consume additional memory since the cached nonces are fully evaluated.

GPU Resistance

GPU performance is generally driven by parallel execution of identical non-branching code-blocks across private regions of memory. RandomHash2 is a highly serial and recursive algorithm requiring a lot of executive-decision making, and decisions driven by Mersenne Twister random number generator. These characteristics make GPU implementations quite tedious and inefficient. Since the predecessor algorithm was shown to be GPU resistant, and this algorithm only exarcerbates these characteristics (except for memory hardness), it is expected that GPU resistance is maintained, although not confirmed as of the writing of this PIP.

ASIC Resistance

ASIC-resistance is fundamentally achieved on an economic basis. Due to the use of 18 sub-hash algorithms and the use of recursion in the core algorithm, it is expected that the R&D costs of a RandomHash ASIC will mirror that of building 18 independent ASICs rather than 1. This moves the economic viability goal-posts away by an order of magnitude. For as long as the costs of general ASIC development remain in relative parity to the costs of consumer grade CPUs as of today, a RandomHash ASIC will always remain "not worth it" for a "rational economic actor".

Furthermore, RandomHash offers a wide ASIC-breaking attack surface. This is due to it's branch-heavy, serial, recursive nature and heavy dependence on sub-algorithms. By making minor tweaks to the high-level algorithm, or changing a sub-algorithm, an ASIC design can be mostly invalidated and sent back the drawing board with minimal updates to the CPU miner.

This is true since ASIC designs tend to mirror the assembly structure of an algorithm rather than the high-level algorithm itself. Thus by making relatively minor tweaks at the high-level that necessarily result in significant low-level assembly restructuring, an ASIC design is made obsolete. So long as this "tweak-to-break-ASIC" policy is maintained by the PascalCoin Developers and Community, ASIC resistance is guaranteed.

Hard-Fork Activation

The PIP requires a hard-fork activation involving various aspects discussed below.

Rationale

Aside from a hash algorithm change, the only other known option to resolve slow validation time is to ship the client with precomputed lookup tables to speed up verification. This has already been done for RandomHash1 periods, but is not a viable option long-term.

Backwards Compatibility

This PIP is not backwards compatible and requires a hard-fork activation. Previous hashing algorithm must be retained in order to validate blocks mined prior to the hard-fork.

Reference Implementation

A reference implementation of RandomHash can be found here. A full implementation is provided below.


TRandomHash2 = class sealed(TObject)
const
    MIN_N = 2; // Min-number of hashing rounds required to compute a nonce
    MAX_N = 4; // Max-number of hashing rounds required to compute a nonce
    MIN_J = 1; // Min-number of dependent neighbouring nonces required to evaluate a nonce round
    MAX_J = 8; // Max-number of dependent neighbouring nonces required to evaluate a nonce round
    M = 64;    // The memory expansion unit (in bytes)
    NUM_HASH_ALGO = 77; // Number of internal cryptographic hash algorithms used
    SHA2_256_IX = 47; // index of SHA2-256 within the 77 algorithms used

private
    FHashAlg : array[0..NUM_HASH_ALGO-1] of IHash;  // declared here to avoid race-condition during mining

    function MemTransform1(const AChunk: TBytes): TBytes; inline;
    function MemTransform2(const AChunk: TBytes): TBytes; inline;
    function MemTransform3(const AChunk: TBytes): TBytes; inline;
    function MemTransform4(const AChunk: TBytes): TBytes; inline;
    function MemTransform5(const AChunk: TBytes): TBytes; inline;
    function MemTransform6(const AChunk: TBytes): TBytes; inline;
    function MemTransform7(const AChunk: TBytes): TBytes; inline;
    function MemTransform8(const AChunk: TBytes): TBytes; inline;
    function Expand(const AInput: TBytes; AExpansionFactor: Int32; ASeed : UInt32) : TBytes;
    function Compress(const AInputs: TArray<TBytes>; ASeed : UInt32): TBytes; inline;
    function ComputeVeneerRound(const ARoundOutputs : TArray<TBytes>) : TBytes; inline;
    function CalculateRoundOutputs(const ABlockHeader: TBytes; ARound: Int32; out ARoundOutputs : TArray<TBytes>) : Boolean; overload;
public
    constructor Create;
    destructor Destroy; override;
    function TryHash(const ABlockHeader: TBytes; AMaxRound : UInt32; out AHash : TBytes) : Boolean;
    function Hash(const ABlockHeader: TBytes): TBytes; overload; inline;
    class function Compute(const ABlockHeader: TBytes): TBytes; overload; static; inline;
end; 

{ ERandomHash2 }

ERandomHash2 = class(Exception);

{ TRandomHash2 implementation }

constructor TRandomHash2.Create;
begin
  FHashAlg[0] := THashFactory.TCrypto.CreateBlake2B_160();
  FHashAlg[1] := THashFactory.TCrypto.CreateBlake2B_256();
  FHashAlg[2] := THashFactory.TCrypto.CreateBlake2B_512();
  FHashAlg[3] := THashFactory.TCrypto.CreateBlake2B_384();
  FHashAlg[4] := THashFactory.TCrypto.CreateBlake2S_128();
  FHashAlg[5] := THashFactory.TCrypto.CreateBlake2S_160();
  FHashAlg[6] := THashFactory.TCrypto.CreateBlake2S_224();
  FHashAlg[7] := THashFactory.TCrypto.CreateBlake2S_256();
  FHashAlg[8] := THashFactory.TCrypto.CreateGost();
  FHashAlg[9] := THashFactory.TCrypto.CreateGOST3411_2012_256();
  FHashAlg[10] := THashFactory.TCrypto.CreateGOST3411_2012_512();
  FHashAlg[11] := THashFactory.TCrypto.CreateGrindahl256();
  FHashAlg[12] := THashFactory.TCrypto.CreateGrindahl512();
  FHashAlg[13] := THashFactory.TCrypto.CreateHAS160();
  FHashAlg[14] := THashFactory.TCrypto.CreateHaval_3_128();
  FHashAlg[15] := THashFactory.TCrypto.CreateHaval_3_160();
  FHashAlg[16] := THashFactory.TCrypto.CreateHaval_3_192();
  FHashAlg[17] := THashFactory.TCrypto.CreateHaval_3_224();
  FHashAlg[18] := THashFactory.TCrypto.CreateHaval_3_256();
  FHashAlg[19] := THashFactory.TCrypto.CreateHaval_4_128();
  FHashAlg[20] := THashFactory.TCrypto.CreateHaval_4_160();
  FHashAlg[21] := THashFactory.TCrypto.CreateHaval_4_192();
  FHashAlg[22] := THashFactory.TCrypto.CreateHaval_4_224();
  FHashAlg[23] := THashFactory.TCrypto.CreateHaval_4_256();
  FHashAlg[24] := THashFactory.TCrypto.CreateHaval_5_128();
  FHashAlg[25] := THashFactory.TCrypto.CreateHaval_5_160();
  FHashAlg[26] := THashFactory.TCrypto.CreateHaval_5_192();
  FHashAlg[27] := THashFactory.TCrypto.CreateHaval_5_224();
  FHashAlg[28] := THashFactory.TCrypto.CreateHaval_5_256();
  FHashAlg[29] := THashFactory.TCrypto.CreateKeccak_224();
  FHashAlg[30] := THashFactory.TCrypto.CreateKeccak_256();
  FHashAlg[31] := THashFactory.TCrypto.CreateKeccak_288();
  FHashAlg[32] := THashFactory.TCrypto.CreateKeccak_384();
  FHashAlg[33] := THashFactory.TCrypto.CreateKeccak_512();
  FHashAlg[34] := THashFactory.TCrypto.CreateMD2();
  FHashAlg[35] := THashFactory.TCrypto.CreateMD5();
  FHashAlg[36] := THashFactory.TCrypto.CreateMD4();
  FHashAlg[37] := THashFactory.TCrypto.CreatePanama();
  FHashAlg[38] := THashFactory.TCrypto.CreateRadioGatun32();
  FHashAlg[39] := THashFactory.TCrypto.CreateRIPEMD();
  FHashAlg[40] := THashFactory.TCrypto.CreateRIPEMD128();
  FHashAlg[41] := THashFactory.TCrypto.CreateRIPEMD160();
  FHashAlg[42] := THashFactory.TCrypto.CreateRIPEMD256();
  FHashAlg[43] := THashFactory.TCrypto.CreateRIPEMD320();
  FHashAlg[44] := THashFactory.TCrypto.CreateSHA0();
  FHashAlg[45] := THashFactory.TCrypto.CreateSHA1();
  FHashAlg[46] := THashFactory.TCrypto.CreateSHA2_224();
  FHashAlg[47] := THashFactory.TCrypto.CreateSHA2_256();
  FHashAlg[48] := THashFactory.TCrypto.CreateSHA2_384();
  FHashAlg[49] := THashFactory.TCrypto.CreateSHA2_512();
  FHashAlg[50] := THashFactory.TCrypto.CreateSHA2_512_224();
  FHashAlg[51] := THashFactory.TCrypto.CreateSHA2_512_256();
  FHashAlg[52] := THashFactory.TCrypto.CreateSHA3_224();
  FHashAlg[53] := THashFactory.TCrypto.CreateSHA3_256();
  FHashAlg[54] := THashFactory.TCrypto.CreateSHA3_384();
  FHashAlg[55] := THashFactory.TCrypto.CreateSHA3_512();
  FHashAlg[56] := THashFactory.TCrypto.CreateSnefru_8_128();
  FHashAlg[57] := THashFactory.TCrypto.CreateSnefru_8_256();
  FHashAlg[58] := THashFactory.TCrypto.CreateTiger_3_128();
  FHashAlg[59] := THashFactory.TCrypto.CreateTiger_3_160();
  FHashAlg[60] := THashFactory.TCrypto.CreateTiger_3_192();
  FHashAlg[61] := THashFactory.TCrypto.CreateTiger_4_128();
  FHashAlg[62] := THashFactory.TCrypto.CreateTiger_4_160();
  FHashAlg[63] := THashFactory.TCrypto.CreateTiger_4_192();
  FHashAlg[64] := THashFactory.TCrypto.CreateTiger_5_128();
  FHashAlg[65] := THashFactory.TCrypto.CreateTiger_5_160();
  FHashAlg[66] := THashFactory.TCrypto.CreateTiger_5_192();
  FHashAlg[67] := THashFactory.TCrypto.CreateTiger2_3_128();
  FHashAlg[68] := THashFactory.TCrypto.CreateTiger2_3_160();
  FHashAlg[69] := THashFactory.TCrypto.CreateTiger2_3_192();
  FHashAlg[70] := THashFactory.TCrypto.CreateTiger2_4_128();
  FHashAlg[71] := THashFactory.TCrypto.CreateTiger2_4_160();
  FHashAlg[72] := THashFactory.TCrypto.CreateTiger2_4_192();
  FHashAlg[73] := THashFactory.TCrypto.CreateTiger2_5_128();
  FHashAlg[74] := THashFactory.TCrypto.CreateTiger2_5_160();
  FHashAlg[75] := THashFactory.TCrypto.CreateTiger2_5_192();
  FHashAlg[76] := THashFactory.TCrypto.CreateWhirlPool();
end;

destructor TRandomHash2.Destroy;
var i : integer;
begin
  for i := Low(FHashAlg) to High(FHashAlg) do
    FHashAlg[i] := nil;
  inherited Destroy;
end;

class function TRandomHash2.Compute(const ABlockHeader: TBytes): TBytes;
var
  LHasher : TRandomHash2;
  LDisposables : TDisposables;
begin
 LHasher := LDisposables.AddObject( TRandomHash2.Create ) as TRandomHash2;
 Result := LHasher.Hash(ABlockHeader);
end;

function TRandomHash2.TryHash(const ABlockHeader: TBytes; AMaxRound : UInt32; out AHash : TBytes) : Boolean;
var
  LOutputs: TArray<TBytes>;
  LSeed: UInt32;
begin
  if NOT CalculateRoundOutputs(ABlockHeader, AMaxRound, LOutputs) then
    Exit(False);
  AHash := ComputeVeneerRound(LOutputs);
  Result := True;
end;

function TRandomHash2.Hash(const ABlockHeader: TBytes): TBytes;
begin
  if NOT TryHash(ABlockHeader, MAX_N, Result) then
    raise ERandomHash2.Create('Internal Error: 984F52997131417E8D63C43BD686F5B2'); // Should have found final round!
end;

function TRandomHash2.ComputeVeneerRound(const ARoundOutputs : TArray<TBytes>) : TBytes;
var
  LSeed : UInt32;
begin
  LSeed := GetLastDWordLE(ARoundOutputs[High(ARoundOutputs)]);
  // Final "veneer" round of RandomHash is a SHA2-256 of compression of prior round outputs
  Result := FHashAlg[SHA2_256_IX].ComputeBytes(Compress(ARoundOutputs, LSeed)).GetBytes;
end;

function TRandomHash2.CalculateRoundOutputs(const ABlockHeader: TBytes; ARound: Int32; out ARoundOutputs : TArray<TBytes>) : Boolean;
var
  LRoundOutputs: TList<TBytes>;
  LNeighbourWasLastRound : Boolean;
  LSeed, LNumNeighbours: UInt32;
  LGen: TMersenne32;
  LRoundInput, LNeighbourNonceHeader, LOutput : TBytes;
  LParentOutputs, LNeighborOutputs, LToArray, LBuffs2: TArray<TBytes>;
  LHashFunc: IHash;
  i: Int32;
  LDisposables : TDisposables;
  LBuff : TBytes;
begin
  if (ARound < 1) or (ARound > MAX_N) then
    raise EArgumentOutOfRangeException.CreateRes(@SInvalidRound);

  LRoundOutputs := LDisposables.AddObject( TList<TBytes>.Create() ) as TList<TBytes>;
  LGen := LDisposables.AddObject( TMersenne32.Create(0) ) as TMersenne32;
  if ARound = 1 then begin
    LRoundInput := FHashAlg[SHA2_256_IX].ComputeBytes(ABlockHeader).GetBytes;
    LSeed := GetLastDWordLE( LRoundInput );
    LGen.Initialize(LSeed);
  end else begin
    if CalculateRoundOutputs(ABlockHeader, ARound - 1, LParentOutputs) = True then begin
      // Previous round was the final round, so just return it's value
      ARoundOutputs := LParentOutputs;
      Exit(True);
    end;

    // Add parent round outputs to this round outputs
    LSeed := GetLastDWordLE( LParentOutputs[High(LParentOutputs)] );
    LGen.Initialize(LSeed);
    LRoundOutputs.AddRange( LParentOutputs );

    // Add neighbouring nonce outputs to this round outputs
    LNumNeighbours := (LGen.NextUInt32 MOD (MAX_J - MIN_J)) + MIN_J;
    for i := 1 to LNumNeighbours do begin
      LNeighbourNonceHeader := SetLastDWordLE(ABlockHeader, LGen.NextUInt32); // change nonce
      LNeighbourWasLastRound := CalculateRoundOutputs(LNeighbourNonceHeader, ARound - 1, LNeighborOutputs);
      LRoundOutputs.AddRange(LNeighborOutputs);
    end;
    // Compress the parent/neighbouring outputs to form this rounds input
    LRoundInput := Compress( LRoundOutputs.ToArray, LGen.NextUInt32 );
  end;

  // Select a random hash function and hash the input to find the output
  LHashFunc := FHashAlg[LGen.NextUInt32 mod NUM_HASH_ALGO];
  LOutput := LHashFunc.ComputeBytes(LRoundInput).GetBytes;

  // Memory-expand the output, add to output list and return output list
  LOutput := Expand(LOutput, MAX_N - ARound, LGen.NextUInt32);
  LRoundOutputs.Add(LOutput);
  ARoundOutputs := LRoundOutputs.ToArray;

  // Determine if final round
  Result := (ARound = MAX_N) OR ((ARound >= MIN_N) AND (GetLastDWordLE(LOutput) MOD MAX_N = 0));
end;

function TRandomHash2.Compress(const AInputs : TArray<TBytes>; ASeed : UInt32): TBytes;
var
  i: Int32;
  LSource: TBytes;
  LGen: TMersenne32;
  LDisposables : TDisposables;
begin
  SetLength(Result, 100);
  LGen := LDisposables.AddObject( TMersenne32.Create( ASeed ) ) as TMersenne32;
  for i := 0 to 99 do
  begin
    LSource := AInputs[LGen.NextUInt32 mod Length(AInputs)];
    Result[i] := LSource[LGen.NextUInt32 mod Length(LSource)];
  end;
end;

function TRandomHash2.Expand(const AInput: TBytes; AExpansionFactor: Int32; ASeed : UInt32): TBytes;
var
  LSize, LBytesToAdd: Int32;
  LOutput, LNextChunk: TBytes;
  LRandom: UInt32;
  LGen: TMersenne32;
  LDisposables : TDisposables;
begin
  LGen := LDisposables.AddObject( TMersenne32.Create (ASeed) ) as TMersenne32;
  LSize := Length(AInput) + (AExpansionFactor * M);
  LOutput := Copy(AInput);
  LBytesToAdd := LSize - Length(AInput);

  while LBytesToAdd > 0 do
  begin
    LNextChunk := Copy(LOutput);
    if Length(LNextChunk) > LBytesToAdd then
      SetLength(LNextChunk, LBytesToAdd);

    LRandom := LGen.NextUInt32;
    case LRandom mod 8 of
      0: LOutput := ContencateBytes(LOutput, MemTransform1(LNextChunk));
      1: LOutput := ContencateBytes(LOutput, MemTransform2(LNextChunk));
      2: LOutput := ContencateBytes(LOutput, MemTransform3(LNextChunk));
      3: LOutput := ContencateBytes(LOutput, MemTransform4(LNextChunk));
      4: LOutput := ContencateBytes(LOutput, MemTransform5(LNextChunk));
      5: LOutput := ContencateBytes(LOutput, MemTransform6(LNextChunk));
      6: LOutput := ContencateBytes(LOutput, MemTransform7(LNextChunk));
      7: LOutput := ContencateBytes(LOutput, MemTransform8(LNextChunk));
    end;
    LBytesToAdd := LBytesToAdd - Length(LNextChunk);
  end;
  Result := LOutput;
end;

function TRandomHash2.MemTransform1(const AChunk: TBytes): TBytes;
var
  i, LChunkLength : UInt32;
  LState : UInt32;
begin
  // Seed XorShift32 with last byte
  LState := GetLastDWordLE(AChunk);
  if LState = 0 then
    LState := 1;

  // Select random bytes from input using XorShift32 RNG
  LChunkLength := Length(AChunk);
  SetLength(Result, LChunkLength);
  for i := 0 to High(AChunk) do
    Result[i] := AChunk[TXorShift32.Next(LState) MOD LChunkLength];
end;

function TRandomHash2.MemTransform2(const AChunk: TBytes): TBytes;
var
  i, LChunkLength, LPivot, LOdd: Int32;
begin
  LChunkLength := Length(AChunk);
  LPivot := LChunkLength SHR 1;
  LOdd := LChunkLength MOD 2;
  SetLength(Result, LChunkLength);
  Move(AChunk[LPivot + LOdd], Result[0], LPivot);
  Move(AChunk[0], Result[LPivot + LOdd], LPivot);
  // Set middle-byte for odd-length arrays
  if LOdd = 1 then
    Result[LPivot] := AChunk[LPivot];
end;

function TRandomHash2.MemTransform3(const AChunk: TBytes): TBytes;
var
  i, LChunkLength: Int32;
begin
  LChunkLength := Length(AChunk);
  SetLength(Result, LChunkLength);
  for i := 0 to High(AChunk) do
    Result[i] := AChunk[LChunkLength - i - 1];
end;

function TRandomHash2.MemTransform4(const AChunk: TBytes): TBytes;
var
  i, LChunkLength, LPivot, LOdd: Int32;
begin
  LChunkLength := Length(AChunk);
  LPivot := LChunkLength SHR 1;
  LOdd := LChunkLength MOD 2;
  SetLength(Result, LChunkLength);
  for i := 0 to Pred(LPivot) do
  begin
    Result[(i * 2)] := AChunk[i];
    Result[(i * 2) + 1] := AChunk[i + LPivot + LOdd];
  end;
  // Set final byte for odd-lengths
  if LOdd = 1 THEN
    Result[High(Result)] := AChunk[LPivot];
end;

function TRandomHash2.MemTransform5(const AChunk: TBytes): TBytes;
var
  i, LChunkLength, LPivot, LOdd: Int32;
begin
  LChunkLength := Length(AChunk);
  LPivot := LChunkLength SHR 1;
  LOdd := LChunkLength MOD 2;
  SetLength(Result, LChunkLength);
  for i := Low(AChunk) to Pred(LPivot) do
  begin
    Result[(i * 2)] := AChunk[i + LPivot + LOdd];
    Result[(i * 2) + 1] := AChunk[i];
  end;
  // Set final byte for odd-lengths
  if LOdd = 1 THEN
    Result[High(Result)] := AChunk[LPivot];
end;

function TRandomHash2.MemTransform6(const AChunk: TBytes): TBytes;
var
  i, LChunkLength, LPivot, LOdd: Int32;
begin
  LChunkLength := Length(AChunk);
  LPivot := LChunkLength SHR 1;
  LOdd := LChunkLength MOD 2;
  SetLength(Result, LChunkLength);
  for i := 0 to Pred(LPivot) do
  begin
    Result[i] := AChunk[(i * 2)] xor AChunk[(i * 2) + 1];
    Result[i + LPivot + LOdd] := AChunk[i] xor AChunk[LChunkLength - i - 1];
  end;
  // Set middle-byte for odd-lengths
  if LOdd = 1 THEN
    Result[LPivot] := AChunk[High(AChunk)];
end;

function TRandomHash2.MemTransform7(const AChunk: TBytes): TBytes;
var
  i, LChunkLength: Int32;
begin
  LChunkLength := Length(AChunk);
  SetLength(Result, LChunkLength);
  for i := 0 to High(AChunk) do
    Result[i] := TBits.RotateLeft8(AChunk[i], LChunkLength - i);
end;

function TRandomHash2.MemTransform8(const AChunk: TBytes): TBytes;
var
  i, LChunkLength: Int32;
begin
  LChunkLength := Length(AChunk);
  SetLength(Result, LChunkLength);
  for i := 0 to High(AChunk) do
    Result[i] := TBits.RotateRight8(AChunk[i], LChunkLength - i);
end;

Links

  1. RandomHash
  2. RandomHash2 Reference Implementation