23CONTROLLING THE KEYBOARD AND MOUSE

Whether you need to click through repetitive menus, fill out forms, or interact with graphical applications, PyAutoGUI enables your scripts to mimic human interactions with the computer. Mastering its use will help you automate almost any task you perform, saving you time and reducing your manual effort.

A simple drawing of a light bulb. LEARNING OBJECTIVES

  • Control the mouse cursor to make it click, drag, and scroll in other applications running on your computer.
  • Control the keyboard to send key presses to other applications, including shortcut hotkey combinations.
  • Know the limitations of GUI automation and how to do an emergency shutdown if your program gets stuck or goes out of control.
  • Use PyAutoGUI’s MouseInfo feature to help you plan where your Python script should move and click the mouse.
  • Locate images on the screen with PyAutoGUI’s screenshot and image-recognition features.
  • Obtain information about the precise coordinates and sizes of various application windows on the screen.

A grey circle with a white question mark at the center Practice Questions

These questions test your ability to use PyAutoGUI to automate GUI tasks such as moving the mouse, clicking buttons, typing text, and taking screenshots.

Staying on Track

Before you jump into a GUI automation, you should know how to escape problems that may arise. Python can move your mouse and enter keystrokes at an incredible speed. In fact, it might be too fast for other programs to keep up with. Also, if something goes wrong but your program keeps moving the mouse around, it will be hard to tell exactly what the program is doing or how to recover from the problem. Fortunately, there are several ways to prevent or recover from GUI automation problems.

  1. 1. Why isn’t it easy to press ctrl-C to stop a Python script using PyAutoGUI?

  2. 2. How can you use the mouse to stop a Python script using PyAutoGUI?

  3. 3. What is the pyautogui.PAUSE setting, and what does it mean when it’s set to 0.1?

  4. 4. What exception is raised if you move the mouse to one of the four corners of the screen?

Controlling Mouse Movement

PyAutoGUI’s mouse functions use x- and y-coordinates similar to the coordinate system used for images, discussed in Chapter 21 of this workbook. Answer the following questions about moving the mouse cursor around the screen.

  1. 5. What are the x- and y-coordinates of the origin?

  2. 6. Which corner of the screen is the origin?

  3. 7. Which letter represents the horizontal coordinate?

  4. 8. Which letter represents the vertical coordinate?

  5. 9. As you move the mouse down the screen toward the bottom, how does the y-coordinate change?

  6. 10. As you move the mouse down the screen toward the bottom, how does the x-coordinate change?

  7. 11. If the screen’s resolution is 1,920×1,080, what is the coordinate of the lower-right corner?

  8. 12. What function returns the size of the screen resolution?

  9. 13. If a variable named screen_size contains a named tuple value Size(width=1920, height=1080), what does screen_size[1] evaluate to?

  10. 14. If a variable named screen_size contains a named tuple value Size(width=1920, height=1080), what does screen_size.height evaluate to?

  11. 15. What function call do you make if you want to move the mouse cursor to 10 pixels from the left edge of the screen and 20 pixels from the top edge of the screen?

  12. 16. The top-left corner of a window is at the coordinates 100, 200. What function call do you make if you want to move the mouse cursor to 10 pixels from the left edge of the window and 20 pixels from the top edge of the window?

  13. 17. The function call pyautogui.move(0, 10) moves the mouse cursor 10 pixels down. How far left or right does it move the mouse cursor?

  14. 18. If the function call pyautogui.move(100, 0) moves the mouse cursor right by 100 pixels, what function call moves the mouse cursor left by 100 pixels?

  15. 19. What is the difference between the pyautogui.moveTo() and pyautogui.move() functions?

  16. 20. What does adding the duration=0.25 keyword to a pyautogui.move() or pyautogui.moveTo() function call do?

  17. 21. Imagine that calling pyautogui.position() returned Point(300, 200); then, you called pyautogui.move(10, 20). What would calling pyautogui.position() return?

  18. 22. If you ran pos = pyautogui.position(), what would pos[0] == pos.x and pos[1] == pos.y evaluate to?

Controlling Mouse Interaction

Once you know how to move the mouse and figure out where it is on the screen, you’re ready to start clicking, dragging, and scrolling. PyAutoGUI has functions for sending these virtual mouse actions to your computer.

  1. 23. What is the difference between pyautogui.click() and pyautogui.click(10, 20)?

  2. 24. How can you make PyAutoGUI click the right mouse button?

  3. 25. What functions are analogous to pyautogui.move() and pyautogui.moveTo(), except they also hold down the left mouse button as the mouse cursor moves?

  4. 26. What is the active, or focused, window?

  5. 27. What does pyautogui.scroll(10) do?

  6. 28. How can you scroll the mouse wheel down?

Planning Your Mouse Movements

One of the challenges when writing a program that will automate clicking the screen is finding the x- and y-coordinates of the things you’d like to click. In those cases, it can be helpful to launch a small application named MouseInfo that is included in PyAutoGUI.

  1. 29. What information does the MouseInfo app give you?

  2. 30. How can you launch the MouseInfo app?

  3. 31. Say you want to quickly record the x, y positions of 20 buttons on the screen. How can you do this with the MouseInfo app?

Taking Screenshots

Your GUI automation programs don’t have to click and type blindly. PyAutoGUI’s screenshot features can create image files based on the current contents of the screen. These functions can also return a Pillow Image object of the current screen’s appearance.

  1. 32. What is a screenshot?

  2. 33. What Python package handles screenshots and images for PyAutoGUI?

  3. 34. What does the pyautogui.pixel() function return?

  4. 35. Does the value returned by pyautogui.pixel() contain alpha (transparency) information?

  5. 36. How can checking the color of a pixel let your program know if something is wrong?

  6. 37. The pyautogui.screenshot() function returns an Image object; how can you save this image as a file named screenshot.png?

Image Recognition

If you’re not sure exactly where PyAutoGUI should click, you can use image recognition to figure it out. PyAutoGUI can take an image of the element you want to click and figure out its coordinates with the pyautogui.locateOnScreen() function. Because you can’t be sure that your program will always find the image, it’s a good idea to use the try and except statements when calling the function.

  1. 38. Say you want PyAutoGUI to locate a Submit button on the screen using the pyautogui.locateOnScreen() function. How can you obtain an image file of this Submit button?

  2. 39. Say you have an image of the Submit button stored in submit.png. How can you determine where on the screen this image is?

  3. 40. What data type does the pyautogui.locateOnScreen() function return?

  4. 41. If the Submit button you’re searching for is partially obscured by another window, will the pyautogui.locateOnScreen() function still be able to find it?

  5. 42. What happens if the pyautogui.locateOnScreen() function is unable to find the image you pass it on the screen?

  6. 43. What does the pyautogui.locateOnScreen() function return if the same Submit button appears multiple times on the screen?

  7. 44. What happens if you run the following code and the same Submit button exists in three separate places?

    import pyautogui
    for box in pyautogui.locateAllOnScreen('submit.png'):
        print('Found submit button on screen.')

Getting Window Information

Image recognition is a fragile way to find things on the screen; if a single pixel is a different color, pyautogui.locateOnScreen() won’t find the image. To locate a particular window, it’s faster and more reliable to use PyAutoGUI’s window features.

  1. 45. Imagine that you run win = pyautogui.getActiveWindow(). What code evaluates to the text in the window’s title bar?

  2. 46. How could you obtain a Window object of every window on the screen?

  3. 47. What function call would return Window objects for all of the windows of the Notepad application currently running?

  4. 48. Imagine that pyautogui.position() returns Point(x=100, y=200). What function call would return Window objects for all of the windows underneath the mouse cursor?

  5. 49. What happens when you change the top or left attributes of a Window object?

  6. 50. What happens when you change the width or height attributes of a Window object?

Controlling the Keyboard

The pyautogui.write() function sends virtual key presses to the computer, which enables you to fill out forms or enter text into applications. In PyAutoGUI, keyboard keys are represented by short string values, such as 'esc' for the esc key and 'enter' for the enter key.

  1. 51. When you call pyautogui.write(), which application receives the keyboard key presses?

  2. 52. Write code that would simulate 1,000 presses of a lowercase x character.

  3. 53. What is the difference between pyautogui.write('leftleft') and pyautogui.write(['left', 'left'])?

  4. 54. What variable in the pyautogui module contains all the keyboard key strings in PyAutoGUI?

  5. 55. What is the difference between pyautogui.write('left') and pyautogui.press('left')?

  6. 56. Write code that could simulate pressing the ctrl-C hotkey combination.

A simple drawing of a sharpened pencil. Practice Projects

In the following projects, you’ll automate the creation of Pollock-like paintings, and record and play back mouse movements.

Jackson Pollock Bot

The American artist Jackson Pollock is known for his “drip technique” of painting, which produced interesting art. Let’s write a program that moves the mouse randomly in a graphics app (such as MS Paint on Windows or Paintbrush on macOS) to create our own Pollock-like paintings.

Your program should use PyAutoGUI to perform the following actions while the graphics app is in focus:

  1. Prompt the user to “Hover the mouse cursor at the top-left corner of the canvas . . .” and pause for five seconds. This pause gives the user time to determine where on the screen the program should draw.

  2. Record the current mouse cursor position in the variables left and top.

  3. Prompt the user to “Hover the mouse cursor at the bottom-right corner of the canvas . . .” and pause for five seconds. This position sets the lower bounds of the canvas in which to draw.

  4. Record the current mouse cursor position in the variables right and bottom.

  5. Move the mouse to a random position between these two corners. This will be the starting point for a randomly drawn line in the graphics app.

  6. Drag the mouse to a random position between these two corners, drawing a line.

  7. Repeat the previous two steps 30 times in total.

You could set the color and brush style in the graphics app before running the program to generate images that have more than simple black lines. You could also change to the color and brush style before running the program a second time to create a more dynamic image. I created the painting in Figure 23-1 using this program.

A Microsoft Paint window featuring many overlapping brushstrokes.

Figure 23-1: A bot-created Jackson Pollock painting

Hint: Calling pyautogui.countdown(5) can not only produce a five-second pause but also print a numeric countdown in the terminal window so that the user knows when each pause ends.

Save this program in a file named pollockBot.py.

Mouse Movement Recorder

Write a program that monitors the position of the mouse every 0.1 seconds and records its coordinates to a JSON file. We’ll use this data in the next practice project, “Mouse Movement Playback.”

The program should run an infinite while loop that records the mouse position by calling pyautogui.position(), waits one-tenth of a second by calling pyautogui.sleep(0.1), and then repeats. When the user presses the ctrl-C keyboard combination, Python will raise a KeyboardInterrupt exception. Your program should catch this exception, write the recorded mouse positions as JSON data to a file named mousePositions.json, and then exit.

The JSON file may look something like this:

[[1331, 1073], [1517, 944], [1619, 727], [1615, 562], [1566, 452],
--snip--
[1855, 948], [1855, 948], [1855, 948], [1855, 948], [1855, 948]]

To indicate to the user that the program is working, print the text “Recording mouse positions. Press ctrl-C to quit.” when the program first starts. When the user presses ctrl-C to stop the program, print the text “Done. 473 positions recorded.” (or however many mouse positions the program recorded).

Mouse Movement Playback

Write a program that takes the x- and y-coordinates recorded by the “Mouse Movement Recorder” program from the previous exercise and moves the mouse cursor to each location, with a 0.1-second pause in between. Doing so effectively “plays back” the movements of the mouse. You could extend this functionality in your own programs to carry out actions the user recorded.

The first part of the program reads the JSON data from mousePositions.json. The second part of the program calls pyautogui.moveTo() to move the mouse cursor to a given position, then calls pyautogui.sleep(0.1) to pause for one-tenth of a second.

While this program isn’t particularly useful on its own, you could use this sort of record-and-playback feature as part of a more elaborate GUI automation program.

Save this program in a file named playbackMouseMove.py.