Our variant on iterative refinement uses the same basic structure as previous implementations.
That is, the system is solved numerically in a loop, with the solution at each iteration contributing some bits to the dyadic numerators and common denominator.  
Specifically, we call the solution in a step of the iteration $\hat{x}$, and divide it into two parts. 
$\hat{x}_{int}$ contains the higher order bits and is incorporated into the dyadic estimate. 
$\hat{x}_{frac}$ contains the lower order bits 
and is unused in Wan's algorithm.
The residual vector obtained by applying $A$ to $\hat{x}_{int}$ provides the right-hand side for the next iteration.  

%\renewcommand{\algorithmiccomment}[1]{\align{right}\{#1\}}

\begin{algorithm}
\caption{{\bf Overlap:} Confirmed continuation iterative refinement to solve $Ax = b$}
\label{alg1}
\begin{algorithmic}
\STATE Input: $A \in \Z^{n\times n},b \in \Z^n,k$. Output: $x \in \Z^n, 0 < q\in \Z$ such that $Ax=qb$.
\STATE Compute $A^{-1}$. \hfill \COMMENT{Numeric LU decomposition}
\STATE $N_{1..n} \gets 0$. \hfill \COMMENT{dyadic numerators}
\STATE $D \gets 1$. \hfill \COMMENT{common denominator}
%\STATE denbound $\gets \prod_{i=1}^n\left\|A_i\right\|_2$ \hfill \COMMENT{Hadamard bound, $A_i$ are the rows of $A$}
\STATE loopbound $\gets 2 \times\prod_{i=1}^n\left\|A_i\right\|\times b_{\mathrm{max}}$.% \hfill \COMMENT{Hadamard bound, $A_i$ are the rows of $A$}
\STATE $r \gets b$.\hfill \COMMENT{Residue of intermediate solutions}
\STATE $s \gets 52 - ${\bf bitlength}($n\times\left\|A\right\|_{\infty}\times\left\|b\right\|_{\infty}$).%\footnotemark.% \hfill \COMMENT{Maximal number of bits to use per iteration}
%\STATE $s \gets 1.$
\STATE thresh $\gets \frac{1}{2^{k}}$. \hfill \COMMENT{Threshold for overlap confirmation}
\STATE Numerically solve: $\hat{x} \gets A^{-1}r$.
\WHILE{$D < $ loopbound}
\STATE $\hat{x}_{int} \gets \lfloor\hat{x}\times2^{s} + 0.5\rfloor$.% \hfill \COMMENT{Separate solution into two parts}
\STATE $\hat{x}_{frac} \gets \hat{x} - \hat{x}_{int}$.
\STATE $r \gets r \times2^{s} - A\hat{x}_{int}$. \hfill \COMMENT{Update residual}
\STATE Numerically solve: $\hat{x} \gets A^{-1}r$.
\IF{ $||\hat{x} - \hat{x}_{frac}||_{\infty} > $ thresh } \label{alg:overlap}
\STATE Shrink $s$, repeat iteration. 
\ELSE
%\STATE Grow $s.$
\STATE $N_{1..n} \gets N_{1..n}\times2^{s} + \hat{x}_{int}$. \hfill \COMMENT{Update dyadics}
\STATE $D \gets D\times2^{s}$.
\IF{ $r = 0$ }
\STATE Return: $N, D$ as $x, d$.
\ENDIF
\ENDIF
\ENDWHILE
\STATE Return: $x, d \gets$ Algorithm \ref{algv} ($N, D$).
%\STATE Return: $x, d \gets dyadic2Rational(N, D)$.
\end{algorithmic}
\end{algorithm}

To ensure the accuracy of the numeric solver's solution at each iteration,
we verify there is agreement between the current iteration's numeric solution and the discarded fractional portion of the previous solution.
That is, we create a threshold $t = \frac{1}{2^k}$ to ensure $k$ bits of overlap.
Then we check that $||\hat{x} - \hat{x}_{frac}||_{\infty} \leq t$.
In practice, we find one bit of overlap (i.e. $k$ = 1) suffices to confirm continuation except for very small $n$.
%Overlap is demonstrated in the conditional statement where prospective solution $\hat{x}$ is checked against $\hat{x}_{frac}$, the leftover bits from the previous iteration.
%The vectors are subtracted and the maximal absolute value in the difference set is checked against a threshold $\frac{1}{2^{k}}$ to ensure $k$ overlapping bits.

%As a basic sketch, Algorithm~\ref{alg1} omits these details.
%Here, the specific initialization of $s$ always allows for an exact residual computation in floating point.%, eliminating the need for big-integer arithmetic within the refinement loop.
%This measure taken for efficiency comes at the cost of precluding systems with large entries.
%We remove this artificial cap to solve such systems.
%In this case, we employ big-integer arithmetic residual update only when necessary ($\left\|A\right\|_{\infty}\times\left\|\hat{x}_{int}\right\|_{\infty} < 52$).

%Exact residue computation is enhanced by 
%Another improvement that has led to more consistent results (provided some numerical solver accuracy) is 
%Additionally, we take an adaptive approach to residual computation.
If the step applying $A$ to $\hat{x}_{int}$ is done in double precision and produces values that cannot fit into the mantissa of a double floating point number, this operation computes an inexact residual.  
The next iteration is then solving the wrong problem. 
This error is detected by neither the norm-based approaches nor the overlap method, since both approaches only guard against numerical inaccuracy of the partial solutions themselves.
If the numeric solver itself is accurate, repeated divergence from the problem we intend to solve will be undetected.  
The algorithm completes after sufficiently many iterations, and reports dyadic estimates that have no hope of being reconstructed into the correct rational solution to the original problem.

%To avoid this error, we determine if it is necessary to exactly compute the residual in a bignum representation (using GMP).
To avoid this error, we employ big-integer arithmetic (using GMP) in the residual update, but only when necessary.
Specifically when $\left\|A\right\|_{\infty}\times\left\|\hat{x}_{int}\right\|_{\infty} \ge 52$, which is a conservative condition.
%Conservatively, we take $n$ times the product of the max norms of the matrix and the iteration's right hand side vector.
%If that product indicates overflow of a double's mantissa is possible, we exactly compute the next residual with big number arithmetic.
The matrix norm is computed beforehand, so it costs only $\O(n)$ work per iteration to compute the vector norm.
The flexibility of this approach both prevents the aforementioned divergent behavior and allows for the use of quicker, double precision computation of the exact residual when possible.
Our experience is that for borderline problems that require some bignum residual computation, the need is rare amongst iterations. % (e.g. most of the time, slower bignum arithmetic is not needed).

%As seen in Algorithm~\ref{alg1}, our initial value of $s$ is set to allow exact residual computation in double precision.
\begin{comment} % lack of space...
Matrices with higher norms are more prone to needing these exact residual computations if the numeric solver is providing many bits of accuracy.
If this exact update occurs frequently, the big number arithmetic hinders performance enough to take a different approach.
The necessity for slow arithmetic can be curtailed with a technique shown in Algorithm~\ref{alg1}.
The value of $s$ is capped as the difference in bits between 52 and the matrix norm.  
Following this, a double's mantissa is likely able to losslessly store the results of the residual-generating apply step, $A\hat{x}_{int}$.

\end{comment}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%  logical division of section

Sometimes the numerators and denominator of the final rational solution are significantly smaller than the {\em a priori} bound on their size.
When this is the case, it is possible to obtain dyadic approximants of sufficient length to reconstruct the solution before the iterative refinement would normally end.
Our early termination strategy is designed to improve running time for these cases.
It is sketched in Algorithm~\ref{alg2}.

\begin{algorithm}
\caption{{\bf Ov-ET:} Confirmed continuation iterative refinement w/ Early Termination to solve $Ax = b$}
\label{alg2}
\begin{algorithmic}
\STATE Algorithm \ref{alg1}, wrapping iterative refinement loop with:
%\STATE Input: $A \in \Z^{n\times n},b \in \Z^n,k$. Output: $x \in \Z^n, 0 < q\in \Z$ such that $Ax=qb$.
%\STATE Input: $A\in\Z^{n\times n},b\in\Z^n,k$. Output: $x\in \Z^n, d\in\Z$% \hfill \COMMENT{Linear system is $Ax=b$}
%\STATE Numerically compute $A^{-1}$.
%\STATE $N_{1..n} \gets 0$.\hfill \COMMENT{dyadic numerators}
%\STATE $D \gets 1$.\hfill \COMMENT{dyadic common denominator}
\STATE bound $\gets \prod_{i=1}^n\left\|A_i\right\|_2$.\hfill \COMMENT{Hadamard bound}
%\STATE loopbound $\gets 2 \times $ denbound $ \times b_{\mathrm{max}}$.
%\STATE $r \gets b$.\hfill \COMMENT{Residue of intermediate solutions}
%\STATE $s \gets 52 - ${\bf bitlength}($\left\|A\right\|_{\infty}$).% \hfill \COMMENT{Maximal number of bits to use per iteration}
%\STATE th $\gets \frac{1}{2^{k}}$.\hfill \COMMENT{Threshold for overlap confirmation}
%\STATE Numerically solve: $\hat{x} \gets A^{-1}r$.
\WHILE{bound $<$ loopbound}
\WHILE{$D < $ bound}
\STATE while loop in Algorithm \ref{alg1}.
%\STATE $\hat{x}_{int} \gets \lfloor\hat{x}\times2^{s} + 0.5\rfloor$% \hfill \COMMENT{Separate solution into two parts}
%\STATE $\hat{x}_{frac} \gets \hat{x} - \hat{x}_{int}$
%\STATE $r \gets r \times2^{s} - A\hat{x}_{int}$ \hfill \COMMENT{Update residual}
%\STATE Numerically solve: $\hat{x} \gets A^{-1}r$
%\IF{ $\max{|\hat{x} - \hat{x}_{frac}|} > $ th } \label{alg:overlap}
%\STATE Shrink $s$, repeat inner loop 
%\ELSE
%\STATE $N_{1..n} \gets N_{1..n}\times2^{s} + \hat{x}_{int}$ \hfill \COMMENT{Update dyadics}
%\STATE $D \gets D\times2^{s}$
%\IF{ $r = 0$ }
%\STATE Return: $N, D$ as $x$.
%\ENDIF
%\ENDIF
\ENDWHILE
\STATE bound $\gets \sqrt{bound \times loopbound}.$
\STATE $i \gets $ {\bf random}($1..n$).\hfill \COMMENT{Select random element}
\IF{Algorithm \ref{algv} ($N_{i}, D$) is {\bf success}}
\IF{$x,d \gets$ Algorithm \ref{algv} ($N, D$) is {\bf success}}
\STATE Return: $x,d.$
\ENDIF
\ENDIF
\ENDWHILE
\STATE Return: $x, d \gets$ Algorithm \ref{algv} ($N, D$).
%\STATE Reconstruct exact $x_{1..n}, d$ from $N_{1..n}, D$
%\STATE Return: $x, d$
\end{algorithmic}
\end{algorithm}

The core iterative refinement loop is still in place, but every so often it is stopped to attempt a rational reconstruction from the current dyadic approximation.
%numerators and denominator.
Specifically it is initially stopped at its halfway point, as soon as the $D$ is larger than the Hadamard bound for $\det(A)$, the initial value of  
$bound$ in Algorithm~\ref{alg2}.
A single-element rational reconstruction is attempted using a random element from the numerator vector $N$ and denominator $D$.
Success here provides encouragement for attempting a full vector reconstruction with all elements of $N$, which is then performed.
%The next section provides detail about the differences between these two reconstruction scenarios.
Success on the vector reconstruction provides a speculative or guaranteed solution, depending on the reconstructed denominator and length of the dyadic approximation. %coupled with our having iterated past our original "half-way" bound ensures we have the exact solution.
After a solution verification, we terminate here, potentially saving many iterations.%, each containing a numeric solve.

Upon failure to rationally reconstruct the solution on an early attempt the $bound$ is set to the bitwise half-way point between itself and $loopbound$, the point at which iterative refinement would end without early termination.
The new value of $bound$ serves as the next checkpoint for an early termination attempt.
This is %system is akin to 
a binary search that keeps reporting ``higher" after failed guesses.
The strategy ensures the number of attempts is logarithmic in the number of iterations required.
Also reconstruction attempts are of increasing density as the full iteration bound is approached, which address the expectation that successful early termination becomes increasingly likely.
An exploration, including running time analysis, of different ways to update the $bound$ checkpoint is planned, especially with regards to the relative time costs of early rational reconstructions attempts (online measurements) versus proceeding with the refinement iterations (iteration times also taken online).