\documentclass[]{seminar}
\usepackage{fancybox}
\usepackage{sem-a4}
\usepackage{psfig}
\usepackage{landscape}
\usepackage{epsfig}

\centerslidesfalse

\def\printlandscape{\special{landscape}}

\articlemag{1}

\slideframe{none}

\newcommand{\heading}[1]{%
  \begin{center}
    \large\bf
    \shadowbox{#1}%
  \end{center}
  \vspace{1ex minus 1ex}}

\newcommand{\BF}[1]{{\bf #1:}\hspace{1em}\ignorespaces}

\newpagestyle{pstyle}
  {rsync in http\hspace{\fill}\rightmark \hspace{\fill}\thepage}{}
\pagestyle{pstyle}

\begin{document}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
  \heading{The rsync algorithm}

The rsync algorithm is a remote differencing and update algorithm. It
allows you to efficiently update a file on one machine with the
contents of a file on another machine while taking advantage of the
common content between the old and new file.

The basic algorithm consists of the following steps:

\begin{description}
\item [signature generation] A signature block is generated for the
  old file.
\item [signature search] The differences between the old and new data
  are computed using a checksum search.
\item [reconstruction] The new file is reconstructed.
\end{description}
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
  \heading{rsync in http}

The web is moving to dynamic content. Traditional web caches can't
cache dynamic content. Integrating rsync in http solves this problem.

\begin{itemize}
\item builds on existing web infrastructure
\item all content is cacheable
\item no extra round trips
\end{itemize}

\end{slide}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Integrating rsync into HTTP}
  
Integrating rsync and HTTP can be done with either server generated
or client generated signatures.
  
Using client generated signatures involves the use of one extra HTTP
header and a new HTTP Content-Encoding type. If we assume that the
client has a cached file to work with then this is what happens:

\begin{itemize}
\item The client generates a signature from the cached file and adds
  it to the request as a Rsync-Signature header. It is base64 encoded.
\item The server generates the page as usual then performs a checksum
  search with the signature to generate the differences.
\item The client receives a ``Content-Encoded: rsync'' reply and
  decodes it to give the new page.
\end{itemize}

\end{slide}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Server generated signatures}

An alternative is to use server generated signatures. I'll assume the
client starts out with no suitable cache file.

\begin{itemize}
\item The client initially generates a null signature block and adds
  it to the request as a Rsync-Signature header.
\item The server generates the page as usual then performs a rsync
  differencing run with the null signature to generate the a
  rsync-encoded page. This leads to a set of deflate compressed
  literal data in a reply marked as ``Content-Encoding: rsync''
\item The server also generates a full rsync signature for the new
  page and appends this to the rsync-encoded reply.
\item The client receives a ``Content-Encoded: rsync'' reply and
  decodes it to give the new page. The client saves the server
  generated signature alongside the page in its cache.
\item The next time the client requests that URL (or a related URL) it
  provides the signature back to the server. 
\end{itemize}
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Which is better?}

We think that server generated signatures are a better choice.

\begin{itemize}
\item The client doesn't need to know the signature format. This
  allows the server a lot of flexibility. It also allows the server to
  use a signature-token instead of a full signature if it wants to.
\item The signature only passes over the wire between clients and
  servers that both know about rsync.
\item There are possible patent problems with the client-generated
  signatures. There are no patent problems with server generated
  signatures.
\end{itemize}

Note that in both cases all operations can be streamed, thus avoiding
a latency increase.

\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Disadvantages}

The main downside of rsync in HTTP is that it puts a higher
computational burden on the server. The main thing influencing the
computational cost is the choice of whether to deflate compress the
data as well as rsync encode it. Without the deflate compression the
rsync algorithm can easily run at 6-10MB/sec on a cheap PC. Very few
web servers are on links that fat.

With deflate compression this reduces to about 1.5 MB/sec on my
PII/350, which is still quite acceptable for most servers.

In either case, it is often much easier to add more CPU power than it
is to add network bandwidth. 
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Proxy servers}

Rsync doesn't need to be implemented in the end-points of a HTTP
connection. There can be a lot of benefits to implementing it in web
proxy servers. In that case the proxy supplies the signature header if
the downstream client doesn't. The proxy then has 4 scenarios to deal
with:

\begin{itemize}
\item
  The downstream client supplied a Rsync-Signature header and we got a
  Rsync-Encoded reply from the upstream server. We send on the reply
  to the client as is.
  
\item The downstream client didn't supply a Rsync-Signature header and
  we didn't get a Rsync-Encoded reply. We send the reply on to the
  client as is.
  
\item The downstream client didn't supply a Rsync-Signature header but
  we got an Rsync-Encoded reply. We need to decode it before sending
  it on to the client.
  
\item The downstream client did supply us with a Rsync-Signature
  header and we got a non-encoded reply. We need to rsync encode the
  reply and send it on to the client.
\end{itemize}
\end{slide}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
  \heading{Failure probability}

The rsync algorithm is probabilistic, as all algorithms of this kind
must be. That means there is a non-zero chance of failure. The
probability of failure can be reduced to arbitrarily small levels by
the choice of appropriate signature lengths. 

With the signature algorithm and sizes currently in use in rsync and a
rate of one million transfers per second we should see a failure in
about $10^{11}$ years. The universe is thought to be about $10^{10}$
years old.
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Cache file selection}

An interesting property of a rsync based web cache is that it doesn't
need to be accurate. This allows the cache to choose a cached page
that doesn't exactly match a URL.

The most obvious thing to do is to truncate the URL at the first
'?',thus removing CGI parameters. Even better would be to first try an
exact URL then progressively trim the URL until a match is found. If
none is found then a null signature is used.

The result is that you get the speedup of rsync encoding if there is
any common content between the page you want and a previous page from
the same site. With the widespread use of large boilerplate HTML
templates, this is a big win.
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{A working prototype - rproxy}

rproxy is a simple fork-per-connection web proxy written in C. It
implements both the client and server side of rsync in HTTP.

\begin{itemize}
\item it can be chained with other proxy servers.
\item It uses a maximum signature size of 512 bytes. 
\item Files smaller than 2k are not cached.
\item Cache files are based on a hash of the URL truncated at the
  first '?'.
\item All requests are pipelined for minimum latency impact.
\item The proxy is implemented on top of librsync, a simple rsync
  library implementation.
\item zlib is used for deflate compression of the rsync encoded data.
\end{itemize}
\end{slide}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{slide}
\heading{Initial results}

Overall, rproxy has reduced the repeat URL traffic on my link by 76\%.

\vspace*{5mm}
\begin{tabular}{|l|c|} \hline
{\bf Site } & { Saving \% } \\ \hline \hline
linuxtoday     &  81 \\ \hline
slashdot.org   &  82 \\ \hline
linux.com      &  93 \\ \hline
excite.com     &  93 \\ \hline
\end{tabular}
\vspace*{5mm}

\end{slide}

\end{document}
