Lately, I haven't exactly been spending enough time on my AMD laptop, traversing the Brisbane landscape with my trusty Samsung Netbook - what a joy! Not knowing where a power-point is, isn't a major issue now. But unfortunately, I end up spending much of my time on my netbook even when I get home. And the downside is that I just don't have the same processing power to do some real computer stuff - sshing, reading Ubuntu Documents (on a 15" screen is the only way), and listening to the radio!
In this quick blog, I'm just going to demonstrate how to Wget some stuff such that you don't download the same thing twice. Wget is a useful tool which can be used to download files from the web without the use of a browser. If you want to get an idea of it's power, just watch 'Social Network'. Basically it can be used to download whole websites and the number of different files that a website will contain. For example, you can download the files that makeup the Google website - the logo, the html, any animations, any javascript files etc (though I probably wouldn't try download all of Google...).
Wget is a simple program to use and should come with your Ubuntu distribution, if not, just get it from the Ubuntu Software Centre.
Open terminal (I'm assuming you're using a linux distro) and go to the folder where you want to save the downloaded files. Then invoke Wget. Invoking is simple - program url:
Wget http://www.aurl.com
This is without the use of any options. To see the options available, just type:
Wget --help
Now, I won't be going through them all, but I'm just going to detail those that are useful to me.
If my friend has a folder on his server that contains a number of files which I will need for a university assignment, and I want to download that folder to my computer, then I will need to do a recursive download. But I don't want to download anything else on his server, just everything
below the folder of choice. So I will need to do a recursive download with 'no parent' - Wget jargon. Do it like this:
Wget -r -np http://www.myfriendsurl.com
All files will be downloaded to the directory that the terminal is currently in and Wget will neatly put them in a folder called '
www.myfriendsurl.com'.
Having fun yet? But you're annoyed that the pathway that it is being saved to, is too long to type aren't you? Well there is a solution! --cut-dirs. That's right!
So, I'm guessing your friend is a space-freak and keeps their server nice and tidy by using some sort of naming hierarchy which doesn't use a date or data genre structure! And now you're download from a site which is longer than 60 characters and looks like http://www.thesaferhaven.com/home/themonolounge/hinduphilosophy/yoyo-sponge-cake. What a nightmare! Well, what you do, is use --cut-dir to rid the extra directory listings that you will never need. Using:
Wget -r -np --cut-dirs=4 http://www.thesaferhaven.com/home/themonolounge/hinduphilosophy/yoyo-sponge-cake
--cut-dirs=4 will cut 4 of the folders off the download saving you clicking time and your sanity.
Now, when you go to the directory where you downloaded all these files, instead of having to click www.thesaferhaven.com > home > themonolounge > hinduphilosophy > yoyo-sponge-cake, you can just click www.thesaferhaven.com, and you will be at the files which you are interested in.
The last option I'd like to tell you about is the 'no clobber' option. All it does is tell Wget that you already have one of the files (which it is trying to download) on your computer, and it will stop Wget from downloading it. Use it like this:
wget -r -np -nc http://www.myfriendsurl.com
Of course, you must be in the same directory otherwise Wget won't see the files!
Too easy! In all honesty, Wget is more useful to those who build websites and want to download whole directories in their entirety. Wget is commonly used to download websites and mirror them from a home server. Something I need to learn more about...
Hope you enjoy this instalment of my Linux knowledge and write back to me on what I should learn next!