CyberCFO@gkmweb.com
Created
on August 25, 2002
Last Updated on September 11, 2004
Changelog:
2004-09-11 - Updated for 10.1 cycle and
the new Mandrake mirror structure
2003-03-02 - Added reference
about dealing with symbolic links, changed date format to
International
2002-09-07 - Updated e-mail address for change in
host domain
© 2002-2004 Gregory K. Meyer
MandrakeSoft's beta testing process is perhaps the most open of all of the major Linux distribution publishers. There are people like you and me all over the world that like to download the latest beta and give it a run just for fun. The problem is, downloading 3 iso images every week or so for a month or two can eat up a lot of bandwidth. Wouldn't it be great if there was a way to download only the differences between the existing iso images you had on your hard drive and the brand new images from the next beta that were just released. There is a way to do that and it is a command line utility called rsync.
rsync is a very powerful tool for mirroring or syncing files in different locations. It has many features and can be used within many contexts, but like many GNU/Linux utilities, the exact commands that are used to perform a particular task are often hard to figure out from the man or info pages unless you already know what you are doing. The scope of this document is how to use it to upgrade an iso set when moving from one beta to another so that you don't have to download the complete set of image files again.
I wrote this article in response to a request from Texstar, the founder of PCLinuxOnline.com and a moderator at that site. He was responding to a post that someone had made about using rsync and he asked for instructions on how to do this. Since the topic interested me, I did some research and found the answer and posted it. After the ensuing discussion, I decided I had learned a lot about using rsync and wanted to share this newfound knowledge. Although this article is written specifically for Mandrake users who want to sync the beta iso's (because I use Mandrake Linux and like to test the beta versions), this information should be able to be used quite easily for other distributions that go through a public beta test period (Red Hat comes to mind).
I also have to add the standard disclaimer that I am not responsible for anything YOU do to YOUR computer to screw it up. Keep backups of your valuable data (including config files) and know what you are doing before you do it. I have tried to be as accurate as possible, but if anyone notices that I made a technical error, please let me know so that I can correct it.
I won't state it as a fact, but I do recall reading documents on both the Debian and Lycoris websites that are encouraging the use of rsync as the bandwidth friendly way to synchronize iso images. As I write this, I cannot find the links to the documents I am referring to. If anyone happens upon them, please let me know and I will add the links here. Although Mandrake has not taken any position whatsoever (that I can tell), it is my opinion that rsync is better because it can use up far less bandwidth. And as Mandrake Linux 9.0beta4 becomes RC1 and then final, it will be a lot easier and faster to update your images with rsync than by re-downloading all of the images.
There may also be some difference of opinion among the sysadmin community about whether rsync use by the masses is a good idea. I spent a significant amount of time searching for some posted documents that argue against it's use but could only find this one, made by TheDarb, a moderator at PCLinuxOnline.com:
"Rsync is one of the more popular methods ftp sites mirror each other. It provides them an easy method to mirror software depots while still having ftp bandwidth shaping for users. Now most sites I've seen allow at the very least 3, at the very most 10 connections to their rsync daemon. These were setup that way to allow the other ftp sites to connect via rsync and mirror... hence down the road alieviating the ftp traffic to the original host or hosts. If you suddenly throw the majority of our readers at their rsync depots, then the ftp sites run the high risk of not being able to get their mirrors updated. Like being slashdotted or a DoS attack, basicly... but it isn't meant as an attack. Just hundreds of well meaning users."
Posted on pclinuxonline.com August 25, 2002
Let me simply say that the number of rsync connections allowed to a public mirror may be significantly less than the number of allowed ftp or http sessions, therefor there is a chance you will be not able to connect using this method during busy times. At any rate, because the session times are lower, connections should turn over faster.
To show the actual effects, I performed a test operation by
syncing my Mandrake Linux 9.0beta3 iso's to beta4. Here are the
stats:
The mirror used for this experiment was
carroll.cac.psu.edu
CD1 - total size of iso file 734560256
bytes, transferred 452127940 bytes
Time to transfer 00:20:24
Avg
speed 604.56KB/s (Total file size/time to transfer)
Bandwidth used
359.91KB/s (Total KB transferred/time to transfer)
only 61.5% of
the file was transferred
CD2 - total size of iso file
731512832 bytes, transferred 254115840 bytes
Time to transfer
00:11:24
Avg speed 1044.48KB/s
Bandwidth used 362.43KB/s
only
34.7% of the file was transferred
CD3 - total size of iso file
545259520 bytes, transferred 310001664 bytes
Time to transfer
00:13:43
Avg speed 646.71KB/s
Bandwidth used 367.68KB/s
only
56.9% of the file was transferred
Total - total size of all
files 2011332608 bytes, transferred 1016245444 bytes
Time to
transfer 00:45:31
Avg speed 719.22KB/s
Bandwidth used
363.39KB/s
only 50.5% of the file was transferred
Just over
50% of the total file size was transferred between the mirror and my
computer, pretty efficient compared to ftp if you ask me. Using
anonymous ftp with the same mirror, downloading each iso in series,
it would have taken closer to three hours (I am on a cable modem by
the way). If I did them in parallel, as I am sometimes selfish enough
to do, It usually takes about 1:15 to get all of them. rsync is still
40% faster, and I use 1 connection to the server instead of 3.
The syntax for using rsync in this fashion looks like:
rsync -switches --options [host]::[rsyncmodule/path/srcfile] [destination]
Some options and switches that look like they might be useful are --stats which will display the statistics of the transfer session, -P which will keep the partial files created during the session which will enable you to resume from where you left off if the session is interrupted and -z which will compress the transmission. I personally have had trouble with compression on these large files but you can try it, particularly if you have a slow connection.
The host can be your favorite local mirror, but you have to find
out if there is an active rsync module running on that server. Just
enter
rsync
fully.qualified.hostname::
to get a list of modules
running on the server.
For my example mirror, I
get
[cybercfo@aurora ~]$ rsync
ftp.example.edu::
|
apache |
Apache |
Once you get a list of modules, you also need to get some info
about the directory structure that exists below the module.
Type
rsync
fully.qualified.hostname::modulename
to get a directory
listing in the module. The remaining path should then be identical to
the ftp path.
For my example mirror, I get
[cybercfo@aurora
~]$ rsync ftp.example.edu::Mandrakelinux
"drwxr-xr-x 4096
2002/08/25 14:15:01 ."
"drwxr-xr-x 4096 2000/09/06
10:38:25 devel"
"drwxr-xr-x 4096 2002/08/24 02:07:57
official"
"drwxr-xr-x 4096 2001/11/30 09:46:53 old"
The module Mandrakelinux houses the cooker and community trees in the devel branch and there is a directory devel/iso/i586 where the beta iso's for the Intel i586 architecture exist. Looking up the ftp path on that server will confirm that the path below Mandrakelinux is ../Mandrakelinux/devel/iso/i586/
Now, put all three of your existing iso images in one directory (hint: maybe /mnt/10.1 so you can export the directory via NFS and use the iso's as urpmi sources for the other machines on the network). Change to that directory and rename them to the same name as the current beta:
mv Mandrakelinux-10.1beta1-CD1.i586.iso
Mandrakelinux-10.1beta2-CD1.i586.iso
mv
Mandrakelinux-10.1beta1-CD2.i586.iso
Mandrakelinux-10.1beta2-CD2.i586.iso
mv
Mandrakelinux-10.1beta1-CD3.i586.iso
Mandrakelinux-10.1beta2-CD3.i586.iso
rsync is not smart enough to compare the two files so you have to rename them for this to work.
Now try
rsync -Pz --stats
ftp.example.edu::Mandrakelinux/i586/Mandrakelinux-10.1*-CD1* .
If you get an error, try leaving out the z switch.
I use the carroll.cac.psu.edu mirror (because it is close to me and seems to be quite fast, but it is very busy during the day) but substitute the mirror of your choice. The path to the iso's is also different on every mirror, so if you use another, make sure you get the path right.
Note the space period at the end to designate the destination, the period means the current directory.
When the first one finishes, you should get some output telling you how things went that looks like this The comments in red were inserted by me):
731512832 100% 1.02MB/s 0:00:00
rsync[3355]
(receiver) heap statistics:
arena: 51224 (bytes from sbrk)
ordblks: 2 (chunks not in use)
smblks: 0
hblks: 0 (chunks
from mmap)
hblkhd: 0 (bytes from mmap)
usmblks: 0
fsmblks:
0
uordblks: 44576 (bytes used)
fordblks: 6648 (bytes free)
keepcost: 6600 (bytes in releasable chunk)
Number of
files: 1
Number of files transferred: 1
Total file size:
731512832 bytes <=== this is the total size
of the ISO
Total transferred file size: 731512832 bytes
Literal data: 254115840 bytes <=== this
is how much of the ISO you actually transferred
Matched
data: 477396992 bytes <=== this is how much
of the ISO matched your local version
File list size: 54
Total bytes written: 268674
Total bytes read: 254264872
wrote 268674 bytes read 254264872 bytes 351808.63 bytes/sec
total size is 731512832 speedup is 2.87 <===
this is the acceleration you received
Now finish the other two ISO's
rsync ftp.example.edu::Mandrakelinux/devel/iso/i586/Mandrakelinux-10.1*-CD2* .
and
rsync ftp.example.edu::Mandrakelinux/devel/iso/i586/Mandrakelinux-10.1*-CD3* .
I am told that if you want to get all 3 at once, use the source file name *.iso instead. It sounds right but cannot confirm that it works because I have not tried it myself.
The final step is to check the md5sums to make sure that you have
good images. Another good use of rsync is to fix an image that has a
bad md5 checksum. Instead of downloading the whole file again, use
rsync to fix it by only downloading the part that is broken.
Now
fire up your CD burner and have fun with the new release.
Thanks to a kind reader, it has been pointed out to me that if the
server is using symbolic links to point to the actual image files in
the rsync module, you will need to use the -L switch to update the
file referenced by the link. If this is the case, you will get an
error something like this:
skipping non-regular file
"MandrakeLinux-9.1rc1-CD1.i586.iso"
My thanks to
John McQuillen for that little tip.
Thanks also to Peter Lamm who provided me the information about his experiences with the 10.1 beta cycle and reminded me to update this document.
To come up with this, I used the rsync man and info pages included with Mandrake Linux 8.2, and a Mini How-To written by J-L Boers which was posted on lycoris.org. Many thanks also to TheDarb and Texstar of PCLinuxOnline.com for technical information, lively debate and the inspiration to write this hopefully helpful article.
© 2002-2004 Gregory K. Meyer