Ok, after a long time, I'm coming back with my Articles. I've been adapting the whole website to HTML5, and there are still some sections that still need to be tweaked. This time I'll show you how to simply get or download files through HTTP with Python. I'll also show you how to particularly do this when HTTP links are protected with usernames and passwords. All this is achieved by using Python's request module from urllib library. As indicated in the documentation, you simply need to call urlretrieve function which recieves the URL of the resource to download and the path of the file where you want to store the downloaded content, like this:
import urllib.request
local_filename, headers = urllib.request.urlretrieve('http://python.org/', 'c:Tempfilename.png')
html = open(local_filename)
html.close()
Now, if you want to authenticate against a server with a username and password, you need to change the default opener so it's capable to handle such authentications.
passman = request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://thecodebeats.com/", "username","pass")
authhandler = request.HTTPBasicAuthHandler(passman)
opener = request.build_opener(authhandler)
request.install_opener(opener)
The code above installs a new URL opener that will be called when invoking the urlretrieve method. Since the site we'll be calling is password protected, this new opener passes new requests through the handler authhandler which deals with the autthentication process and stores usernames and passwords. I originally needed to do this to download a bunch of files with the same naming convention from a server. The example below does exactly that. We use a Try.. catch to handle any exception when opening the iterative links (particularly 404 errors). I've also included a convenient error log and a progress indicator.
import sys
from urllib import request
passman = request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, "http://thecodebeats.com/", "showme","thecandies")
authhandler = request.HTTPBasicAuthHandler(passman)
opener = request.build_opener(authhandler)
request.install_opener(opener)
number_of_files = 3
for i in range(1,number_of_files+1):
file_to_download = 'http://thecodebeats.com/gallery/download-files-through-http-with-python/candies/' + str(i) + '.png'
local_file = 'c:/Temp/' + str(i) + '.png'
progress = round((i / number_of_files) * 100,2)
progresstext = str(progress) + '% -> Downloading file # ' + str(i) + ' of ' + str(number_of_files)
try:
local_filename, headers = request.urlretrieve(file_to_download,local_file)
except:
print("Error Downloading file # " + str(i) + " : " + str(sys.exc_info()[1]))
text_file = open("c:/Temp/log.txt", "a+")
text_file.write("Error Downloading file # " + str(i) + " : " + str(sys.exc_info()[1]) + 'n')
text_file.close()
else:
html = open(local_filename)
html.close()
print(progresstext, end="r", flush=True)
Hope you found it useful!