UPDATE – I have updated this article to use BeautifulSoup to parse the HTML rather than regular expressions. This makes it much easier.
Reddit is a popular site that allows users to post and vote on interesting web links. It is divided into several topical subreddits. Many Redditors use Imgur to host their images (and I highly recommend it: Imgur is free and easy to use). This tutorial tells you how to write a Python script that can scan Reddit and download images from Imgur submissions you find. This tutorial is for beginner-level programmers with a small amount of Python experience.
This post will cover:
- Basic web scraping concepts.
- Command line options.
- Accessing Reddit with the PRAW module.
- Using regular expressions to find text patterns in a web page.
- Downloading files with the Requests module.
- Detecting which files are on our computer with the
- Opening files using Python’s