ISO images (and why I hate them)

Oct 17, 08:00 am
tags: ,

One of my duties at Apress is to commission books on Linux. I also author Linux books myself. My job involves keeping a close eye on the whole Linux distro scene. This in turn involves downloading many ISO images so I can install and evaluate various Linux distros.

Playing around with Linux all day isn’t a bad job to have but it’s hampered by the fact I have the world’s slowest DSL connection. OK, I exaggerate. At 512Kbps it’s one of the slowest broadband connections available where I live. This is because of my distance from the nearest telephone switch.

60KB/sec makes downloading ISOs tricky. CD ISOs usually arrive intact, free of corruption. With DVD ISOs I have a roughly 50/50 hit/miss rate. And, frankly, I’ve had enough.

The last DVD ISO I tried to download was Mandrake 2007. I wanted to write a review of it on this blog, in fact. So I started the download early on Friday. I left my office computer switched on over the weekend, to download the ISO, and had to check every few hours to make sure the download hadn’t stalled (I work from home).

Real life got in the way of my checking and, sure enough, the download stalled several times and for several hours, meaning that it took until Sunday evening for the download to finish. At that point I checked the md5sum and, yup, it was wrong. I decided to burn it anyway but there must have been significant corruption because the ISO wasn’t recognized and wouldn’t burn.

Arrgghh! Must… suppress… murderous… impulse…

Even if only one byte in the entire file is wrong, the md5sum will be different and I have no way of knowing if there’s serious corruption involved. In fact, I once burned a SUSE Linux DVD ISO that had a bad md5sum and it transpired one package file was corrupted. By careful package selection I could work around it and still use the DVD.

There has to be a better way of distributing Linux ISOs than this. Linux is all about accessibility and availability, but by relying on ISO images as one of the main distribution mediums, we’re effectively limiting Linux to those who have fast Internet connections. To the rest we’re saying they can take their best shot at downloading but success isn’t guaranteed.

What I’d like to see is some system of managed downloads, maybe where separate smaller chunks are downloaded and checksummed there and then. If there’s corruption then the chunk is downloaded afresh. Almost certainly, the solution is to download smaller files—it’s when betting the house on one large file that the risk of corruption is unacceptably high.

Some distros have FTP-based installers, whereby packages are downloaded on demand. I guess this is a step in the right direction, although in my experience FTP-based installs have always involved complicated setup (more complicated than a straight install, anyway).

Maybe a scheme like I’ve described already exists, although I suspect an entirely new file transfer protocol is called for. Let me know in the comments below if you know of one. In fact, let me know what you think of this whole issue, because I think this is a problem that needs fixing, especially as ISO images get larger and larger. Let’s try and get some momentum on this issue.



    1. Please see Jigdo (http://www.atterer.net/jigdo/) ;)



    1. I’ve only used Jigdo a few times but it isn’t the solution to this problem because

      (a) It’s Debian only (effectively it’s like the FTP installers of other distros, except it builds an installation ISO at the end);

      (b) It’s not exactly easy to use for newcomers. As mentioned, this must be something simple to use – a executable of some kind that the user can run to initiate the download, or, as mentioned, a file transfer protocol for which clients/servers can be used.



    1. I only have a passing knowledge of the bittorrent specification, but doesn’t bittorrent do exactly this? It breaks the file into smaller chunks and then checks a hash for each chuck.



    1. The problem with BitTorrent is that it requires you to punch a hole through your firewall/NAT for uninvited incoming traffic. Not only is this complicated for less knowledgeable users, but it’s also insecure. Effectively you’re running an Internet-accessible server from your home computer, which seems crazy to me. The whole security of your computer hinges on the security (or otherwise) of your BitTorrent client.


    1. Andrew Hill says:

      BitTorrent does exactly what you are talking about…it splits the file up into small junks and verifys each one after download.

      Most distros offer a bittorrent download option now.



    1. Though jigdo is only commonly used by the Debian-based distros (including Ubuntu), I believe it supports any ISO-formatted disc image. You could probably jigdo Windows XP with Service Pack 2 slipstreamed if the EULA allowed.

      Unfortunately, because no one else uses it, it won’t help you on your end of the connection. You’d have to convince the rest of the planet.



    1. “The problem with BitTorrent is that it requires you to punch a hole through your firewall/NAT for uninvited incoming traffic”

      You don’t have to. Downloads can be made through firewalls with some clients (I use uTorrent). Of course your connection rate will increase if you set-up your firewall correctly, but for something like a distro, there are usually 100s of seeds so you’ll most likely saturate your 60K/s

      Most bittorrent clients are pretty secure, and opening up a port through a firewall isn’t really very bad. As long as you don’t have buggy software listening on that specific port, there is nothing to fear. uTorrent also tries to autoconfigure your NAT using uPNP so many routers will be auto-configured.




Add your comments

Please keep your comments relevant to this blog entry: inappropriate or purely promotional comments may be removed. To add hyperlink, please follow this example: "your link text":http://your.link.url