Appium + Sauce Labs Tutorial

By Jonathan Lipps, Director of Engineering, Ecosystems, at Sauce Labs. You can find him here on StackShare and Twitter

Welcome to this brand-new guide to Appium and Sauce Labs using Ruby. We'll be diving in quickly to the basic concepts involved in mobile automation and, specifically, mobile automation with Appium. Once the basics are out of the way, we'll go in-depth with a selection of more advanced topics: performing touch gestures and automating hybrid and mobile apps.



Appium is a tool that makes automation possible for iOS and Android devices. It's architected around 4 key principles:

  • You should modify your app as little as possible in order to test it. You don't want strange test libraries affecting the operation of your app. Ideally, you wouldn't modify your app at all.
  • You should be able to write your tests in any programming language, using any test runner and framework. Your organization is already skilled in particular languages and frameworks; you should be able to use those successfully with Appium.
  • The API and mental model around automation doesn't need to be rewritten from whole cloth. There is already a very successful automation standard, so we should reuse and extend that rather than creating an entirely new model.
  • Mobile automation is for everyone. The best mobile automation tool will be open source, and not just in terms of having its code available for view. It should be governed using open source project management practices, and eagerly welcome new users and contributors.

Appium satisfies all these requirements in the following ways:

  • Appium allows you to automate your Android and iOS apps without modification, because it relies on underlying automation APIs supported by the mobile OS vendors (Apple and Google) themselves. These APIs are integrated into the development process and thus don't require any 3rd-party libraries to be embedded in test versions of your apps.
  • Appium is built around a client/server architecture, which means Appium automation sessions are really HTTP conversations (just as when you use any kind of REST API). This means clients can be written in any language. Appium has a number of clients already written, for Ruby, Java, JavaScript, Python, C#, PHP, Objective-C, and even Perl.
  • Selenium WebDriver is without a doubt the most widely-known framework for automating web browsers from a user's perspective. Its API is well-documented, well-understood, and is already a W3C Working Draft. Appium uses this specification and extends it cleanly with additional, mobile-specific automation behaviors.
  • Appium is open source. It should be clear from the volume of issues and pull requests that we have a very active community (who also engage with one another on our discussion forum).

Sauce Labs

In this guide we'll be using Sauce Labs's cloud of Appium servers, so that we don't need to download and configure the Appium server on our own machines, not to mention the Android and iOS development platforms and associated system dependencies. It also means we'll be able to run iOS tests from our, say, Windows machine, even though iOS development itself requires a Mac with Xcode.

Because Appium is built with a client/server architecture, it doesn't matter whether an Appium server is hosted on the same machine as the tests. We'll be running our tests (i.e., using the Appium clients) locally, but the Appium servers (and attached mobile devices) will be in the Sauce Labs cloud. It's that simple! Of course, Sauce provides a large number of optional desired capabilities that help tweak the behavior of your Sauce sessions. Sauce also provides ways to pre-upload your app to our cloud, or to ensure that web requests are directed through a secure tunnel to your own infrastructure. Those topics are outside the scope of this guide, but can be explored at your own leisure at the Sauce Labs docs.

The most useful tool in those docs is probably the Platforms Configurator, which is a little wizard app that guides you through choosing the right desired capabilities based on the platform and kind of test you want to automate. In this tutorial, the desired capabilities will all be specified already in the code, but the Platforms Configurator will be useful if you want to experiment with other platforms.

Basic Concepts

Writing Appium tests is easy and straightforward, but it's still important to understand some basic concepts. If you've ever written a WebDriver test, you already know everything you need to know about writing Appium tests. For those who haven't, here is some essential terminology:

  • Driver Client: Appium is considered a "driver", because it "drives" mobile applications as though it were a user. You write your Appium tests using a client library that takes care of taking your test steps and wrapping them up to be sent to the Appium server over HTTP. Thus in our Appium test code, we'll instantiate the client object and call it @driver. (NB: Because Appium simply extends the WebDriver protocol, Appium clients have very little work to do: they can simply wrap the appropriate WebDriver client and add a few mobile bits. This keeps the code clean but places a conceptual burden on you as the test writer; if you don't already know about WebDriver, you'll likely need to familiarize yourself with the Appium client documentation and the documentation for the WebDriver client it extends.)
  • Appium Session: Appium tests take place in sessions. You have to first initialize a session, which triggers Appium to kick off automation for your app, with certain parameters you specify. When you're done automating, you need to end the session (by calling @driver.quit) to let Appium know it can safely shut down your app and wait for another session.
  • Desired Capabilities: The parameters you pass to initialize an Appium session are called "desired capabilities", because they specify the kind of automation session you want from the Appium server. If Appium can't satisfy your request, it will error on session start. The following capabilities are essential for any Appium session:
    • platformName: the mobile OS you want to automate (e.g., Android or iOS)
    • platformVersion: the mobile OS version (e.g., 5.0 or 8.3)
    • deviceName: the name of the type of mobile device (e.g., iPhone Simulator or Android Emulator or Galaxy S4)
    • app or browserName: the path (in our case, a URL) to your mobile app, or the name of the mobile browser if you want a web session (e.g., chrome or safari)
    • appiumVersion: when running on Sauce Labs, it's important to specify the version of Appium you want to use, since these are added frequently (e.g., 1.3); if you're running Appium locally, you can ignore this
  • Driver Commands: Once you have a session, you can realize your test steps using a large and expressive vocabulary of commands. For example, you can tap on an element with a certain accessibility id with the following code:

    @driver.find_element(:accessibility_id, "Login").click

    We'll get to know some of the available commands through the examples in this guide, though of course a complete reference will be beyond our scope for today. For that, you'll want to familiarize yourself with the appium_lib documentation and the Selenium WebDriver ruby client documentation (which is what appium_lib extends).

...and that's basically it! We use a client library in our favorite language (today, that's Ruby) to get a driver instance, then we start the driver using appropriate desired capabilities to get a session. Once we've got a session, we can continue to use library methods on the driver object to act out our test steps. When we're done with our test (or set of tests), we end the session.

Writing Tests

Let's jump in and run a basic Android native app test! To follow along, you'll need git and ruby available. If you haven't already, you'll also need to make sure you have Bundler installed:

$ gem install bundler

Now you can clone the repo with the code for this tutorial, navigate to the code, and set up the dependencies:

$ git clone
$ cd appium-ruby-example
$ bundle install

At this point you'll need to set up your Sauce Labs username and access key. If you don't have an account, you can create a free one at the signup page. Then navigate to your account page, where your access key will be displayed to you. Now you want to set them up as environment variables:

$ export SAUCE_USERNAME="xxx"
$ export SAUCE_ACCESS_KEY="yyy"

Of course, you probably want do add these to your .bashrc, .zshrc, or equivalent so you don't have to do this more than once. At this point, we're ready to run the example test:

$ ruby android_basic.rb

To see your test running, navigate to your Sauce dashboard and you should see a test in the running state. If you click on the link, you'll be able to watch the test execution as it happens (or after the fact, via the video recording). So far, so good! When the test completes, you should something like this output locally (where the single dot means the single test passed):

Run options: --seed 24283

# Running:


Finished in 51.602959s, 0.0194 runs/s, 0.0194 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

If there was a failure or an error during test execution, you'll get that information printed out here and can use it as well as the Sauce Labs test page to dig into what could have gone wrong.

So we ran a test. But what have we actually done? Let's step through the code in android_basic.rb:

require "rubygems"
require "appium_lib"
require "minitest/autorun"

Here we set up our dependencies, notably the Appium client library and the Minitest test framework for Ruby (we could use any test framework or none; Minitest is a nice, compact one).

describe "Basic Android Test" do
    def caps ()
            caps: {
                appiumVersion: "1.3.7",
                platformName: "Android",
                platformVersion: "5.0",
                deviceName: "Android Emulator",
                app: "",
                name: "Basic Android Native Test",

Using the Minitest organization pattern, we describe a test block, and set up a helper function that gives us our desired capabilities as an object. All of these were described above! (The app we're using is a variant of the Android "ApiDemos" application, included in the Android developer sample code.)

    before do
        @driver =

    after do

In this section we define what we want to do before and after the tests in this block. Before the tests, we get an Appium driver object using This driver object wraps a WebDriver object, which we kick off with @driver.start_driver, (using our Sauce Labs username and access key).

And of course after the tests we need to end the session; we use @driver.driver_quit, which is a safe wrapper around the session end command. If for whatever reason ending the session fails, we don't want that in and of itself to fail a test!

Finally, we have the test itself:

    describe "when I open the app" do
        it "should be able to navigate to the Action Bar" do
            list_el = @driver.find_element :accessibility_id, "App"
            texts = @driver.find_elements :class_name, "android.widget.TextView"
            texts[1].text.must_equal "Action Bar"

Here we're specifying that a certain navigation path should exist in the app. We're using the WebDriver API to find an element with the accessibility ID (content-description on Android) of "App", tapping it, then asserting that the second TextView we find should have the text "Action Bar" on it.

A basic test indeed, but it illustrates the three most essential actions you will use in your scripts: finding elements, interacting with them (e.g., by tapping), and inspecting them (e.g., by asking for their text).

Important Commands

There are quite a lot of commands available for inspecting elements present on the UI of a device and interacting with them. So many, in fact, that it can be overwhelming to learn them all at once. The complete list would be a combination of all the API endpoints described in the Selenium Documentation and the Appium Documentation.

The first commands to learn are the following:

  • Finding Elements
    • el = @driver.find_element(strategy, selector)
    • el = @driver.find_elements(strategy, selector)
  • Inspecting Elements
    • el.text
    • el.location
    • el.size
  • Interacting with Elements
    • el.send_keys(text)

Finding Elements

In order to perform any meaningful command, one needs a UI element to work with. Appium allows for finding UI elements by a number of means. The preferred method is to find elements by their Accessibility Id. These would be identifiers which app developers manually attach to important elements so that different handicap accessibility interfaces can meaningfully interpret the UI. The Android and iOS platforms both have Accessibility programs (iOS, Android).

button = @driver.find_element :accessibility_id, "play-button"

Elements can also be found by using the name of their class. On Android devices, these names start with “android.widget.” eg. “android.widget.TextView” and “android.widget.LinearLayout”. On iOS, class names start with “UIA”, eg. “UIATextField” and “UIATableView”.

button = @driver.find_element :class_name, "UIAButton”

If multiple elements are found by these commands, only the first is returned. For finding multiple elements a pluralized version of each command exists. These commands return arrays of elements.

buttons = @driver.find_elements :accessibility_id, "play-button"

One can find elements contained within another element:

table_views = @driver.find_elements :class_name, "UIATableView"
button = table_views[2].find_element :accessibility_id, "play-button"

These different approaches to finding elements (by class name or by accessibility ID) are called locator strategies. Appium has additional locator strategies for finding elements by id (:id), xpath (:xpath), and platform specific locators like iOS UIAutomation (:uiautomation) commands and Android UIAutomator (:uiautomator) selectors.

Inspecting Elements

By inspecting the properties of elements visible on the UI, we can detect whether or not the app behaves as expected. We can test for the presence of a popup, look for a user’s name when logged in, check that lists are populated, that images are in the right place, etc.

Whenever a UI element is “found” through appium, the server returns an id, not an object populated with UI properties. Additional functions need to be called (and HTTP requests made to the server) in order to get the specific properties of an element.

The “text” command returns the textual contents of the element.

button_text = button.text

The “location” command returns the current location of the element on the screen, measured in pixels.

location = button.location

The “size” command returns the size of the element on the screen, measured in pixels.

dimension = button.size

Since these properties are calculated when the command is called, if the element is no longer visible on the UI the command will fail.

Interacting with Elements

By interacting with elements, we simulate the actions of a user, typing into fields, pressing buttons, tapping the screen, and performing touch gestures.

Use the “click” command to simulate tapping on an element:

Use the “send keys” command to type into a text field.

text_field.send_keys('Hi, my name is Bob')

Touch gestures will be discussed later on. Meanwhile, let's use these new commands to write another test, this time for iOS! You can go ahead and run the test if you want, the same way as before, but with a different test file:

$ ruby ios_basic.rb

You can, as before, watch the test run live on the Sauce Labs VM, which is pretty fun (and which will show you what kind of app we're automating---a very basic calculator app). This test, while simple, demonstrates almost all of the commands you'll need to know about while using Appium. Here's the full test code:

require "rubygems"
require "appium_lib"
require "minitest/autorun"

describe "Basic iOS Test" do
    def caps
            caps: {
                appiumVersion: "1.3.7",
                platformName: "iOS",
                platformVersion: "8.2",
                deviceName: "iPhone Simulator",
                app: "",
                name: "Basic iOS Native Test",

    before do
        @driver =

    after do

    describe "when I use the calculator" do
        it "should sum two numbers correctly" do
            # populate text fields with values
            field_one = @driver.find_element :accessibility_id, "TextField1"
            field_one.send_keys 12

            field_two = @driver.find_elements(:class_name, "UIATextField")[1]
            field_two.send_keys 8

            # they should be the same size, and the first should be above the second
            assert field_one.location.y < field_two.location.y
            assert_equal field_one.size, field_two.size

            # trigger computation by using the button
            @driver.find_element(:accessibility_id, "ComputeSumButton").click

            # is sum equal?
            sum = @driver.find_elements(:class_name, "UIAStaticText")[0].text
            assert_equal sum.to_i, 20

The boilerplate is basically the same (in a real test suite, we certainly would encapsulate these similarities in some kind of code abstraction). In the test steps, however, we see the use of the various new commands we've just learned (different locator strategies, send_keys, size, etc...).

Reporting Test Status

You'll notice that in the Sauce Labs UI, the tests we've ran so far show a status of complete, even though locally we saw that the tests didn't just complete, they passed! The reason is that the pass/fail logic is determined solely by our local test script. Sauce Labs is merely the automation endpoint; it doesn't know about tests per se, and so we need to let Sauce know if we want to see the status correctly in the Sauce UI. Luckily, there's a nice gem for that: Sauce Whisk. We can incorporate Sauce Whisk into our Minitest suite, and I've done that in the ios_with_reporting.rb file. Let's give it a run:

$ ruby ios_with_reporting.rb

Notice that this time, the test gets a passed status once it finishes. Looking at the code, all we had to do was add this dependency:

require "sauce_whisk"

And then modify our after block to use Sauce Whisk. We also print out a helpful link if the test doesn't pass, that way we can navigate directly to the failed test page:

after do
    session_id = @driver.session_id
    unless passed?
        puts "Failed test link:{session_id}"
    SauceWhisk::Jobs.change_status session_id, passed?

The magic here has to do with accessing the session_id of the internal driver object, then using that to set the status of the test using the Sauce Whisk library.

We're now all done with the basics of writing Appium tests and running them on Sauce Labs. Let's explore a few advanced topics.

Touch Actions

One aspect of mobile devices that needs to be automated in order to fully test applications, whether native, hybrid, or web, is utilizing gestures to interact with elements. In Appium this is done through the Touch Action and Multi Touch APIs. These two APIs come from an early draft of the WebDriver W3C Specification, and are an attempt to atomize the individual actions that make up complex actions. That is to say, it provides the building blocks for any particular gesture that might be of interest.

The specification has changed recently and the current implementation will be deprecated in favor of an implementation of the latest specification. That said, the following API will remain for some time within Appium, even as the new API is rapidly adopted in the server.


The Touch Action API provides the basis of all gestures that can be automated in Appium. At its core is the ability to chain together ad hoc individual actions, which will then be applied to an element in the application on the device. The basic actions that can be used are:

  • press
  • long_press
  • tap
  • move_to
  • wait
  • release
  • cancel
  • perform

Of these, the last deserves special mention. The action perform actually sends the chain of actions to the server. Before calling perform, the client is simply recording the actions in a local data structure, but nothing is done to the application under test. Once perform is called, the actions are wrapped up in JSON and sent to the server where they are actually performed!

The simplest action is tap. It is the only one that cannot be chained with other actions, since it is a press and release put together.

The rest of the actions are straightforward, and cover the sorts of touch screen interactions that one would expect. The beginning of most interactions is with either press or longPress, which can be performed on a point on the screen, an element, or an element with an offset from its top left corner. The only difference between the two methods is, as their names suggest, the length of time the gestures spends down.

After pressing, the gesture can include waiting and moving, to automate complex interactions. For instance, to simulate dragging and element onto another element, you might automate a long_press, move_to, wait, and release. In Ruby, assuming you have a driver instance, this would look like

source = @driver.find_element :accessibility_id, "Source"
destination = @driver.find_element :accessibility_id, "Destination"
dnd =
dnd.long_press(element: source).move_to(element: destination).wait(500).release.perform

Again, what's going on here is that we are finding two elements and describing a drag and drop action in relation to them. The wait function takes a time in milliseconds, which will be the minimum amount of time after the previous action that the subsequent action is performed. It is therefore useful for synchronization, as well as for actions, like the one above, that generally need some pause in order for the position to be registered by the application itself.

For documentation, see here, for the API in various languages see: Java, Ruby, Python, PHP, Perl, C#, and JavaScript.


A note on what the position arguments mean is in order. The most basic way to specify a position is to use an element. All the methods that deal with position (i.e., tap, press, long_press, and move_to) can take an element as their point of action. Alone, this is interpreted as the center of the element. At the same time as the element, a point can be passed in, in the form of x and y. If both an element and a point are given to the method, the point is interpreted as an offset from the top-left corner of the element.

The final possibility is a point alone. In the absence of an element, a point is taken literally, as the position on the screen, for all the “static” methods. That is, tap, press, and long_press. In the move_to method, however, the point is interpreted as an offset from the point from which it is a move. This leads to many conceptual errors, mostly indicated by either wildly erroneous moves, or out of bounds errors (the errors "The coordinates provided to an interactions operation are invalid" and "Coordinate [x=500.0, y=820.0] is outside of element rect: [0,0][480,800]" are common in this case).

Multi Touch Actions

Mobile applications, however, are not simply interacted with using a single gesture. Simple actions such as pinching and zooming require two fingers, and more complex interactions may take even more. In order to automate such actions Appium supports the Multi Touch API, which allows you to specify multiple Touch Action chains which will be run near-simultaneously.

If, for instance, you wanted to drag on element to the position of a second, while at the same time dragging the second to the position of the first, you would first build the individual actions, then add them to a multi action object:

el1 = @driver.find_element :accessibility_id, "Element 1"
el2 = @driver.find_element :accessibility_id, "Element 2"

action1 =
action1.long_press(element: el}).move_to(element: el2).wait(500).release

action2 =
action2.long_press(element: el2).move_to(element: el1).wait(500).release

multi_drag =
multi_drag.add action1
multi_drag.add action2

Notice that there is a convenience method multi_touch added on the @driver object that simply takes an array of TouchAction objects' actions and takes care of sending them to the server so that they run in parallel.

So, once you have the individual gestures working, getting complex multi-pointer gestures working is as simple as adding them to the list of actions sent to multi_touch. Appium does the rest!

Of course, gestures like pinch and zoom are so common that the client library provides helper methods for them. We don't even need to understand the Touch Action API to use them. For example, if we want to pinch the screen from its outer corners all the way in to 25% of the screen size, we can do this:

@driver.pinch 75 # pinch 75% of the screen

Likewise, we can zoom in by a percentage factor:

@driver.zoom 200 # zoom in by a factor of 2

For language-specific documentation on the Multi Action API, see: Java, Python, PHP, C#, and JavaScript.

Finally, if you want to see an example of this API in action, go ahead and run android_gestures.rb!

$ ruby android_gestures.rb

Working with Hybrid and Mobile Web Apps

Appium handles hybrid and web apps in similar ways. Both require a different “context” in which to function. A “context” is basically an independent application mode. Appium knows about two kinds of contexts: native contexts and webview contexts. Hybrid apps are apps that have multiple contexts, both native and webview. Mobile web apps are simply websites served from within a mobile web browser such as Safari or Chrome.

Appium enables not just native app automation but also hybrid and mobile web automation. The genius is that webviews (whether in your own hybrid app or inside a web browser) are essentially invisible little web browsers, and web browsers are what the WebDriver protocol was originally meant to drive! This makes it very natural to switch from automating native contexts to webview contexts in Appium. When you switch into a webview context, your driver becomes a regular old WebDriver, and you can do things like find elements by their css selectors, or get the HTML page source, just like you could if you were automating a web browser.

But Appium needs to know that we want to automate a webview context instead of a native context. This is taken care of automatically if we specify a mobile web browser in our capabilities. For example, this set of capabilities would run a test on mobile Safari:

def caps
        caps: {
            appiumVersion: "1.3.7",
            platformName: "iOS",
            platformVersion: "8.2",
            deviceName: "iPhone Simulator",
            browserName: "Safari",
            name: "Basic iOS Web Test",

What's different about this set is that we are not using the app capability; instead we're using the browserName capability, which lets Appium know we want a mobile web browser rather than an app we've developed. When we get our session started with this set of capabilities, we are put automatically into the webview context, which means, for all intents and purposes, we are running a WebDriver session. Give it a try:

$ ruby ios_safari.rb

The test code associated with this file is as follows:

describe "when I go to Google" do
    it "should be able to search for Sauce Labs" do
        @webdriver = @driver.driver ""

        search = @webdriver.find_element :name, "q"
        search.send_keys "sauce labs"
        search.send_keys :enter

        # allow the page to load
        wait { assert_equal "sauce labs", @webdriver.title[0..9] }

Here we are using methods on @driver that we haven't seen before, because they have to do with web automation (like sending a browser to a URL, or getting the title of a webpage). Note that we first accessed the wrapped inner driver and called it @webdriver, to make sure we're using the pure Selenium WebDriver client, and not any of the Appium wrapper.

What's interesting is that we have access to all these web-based automation commands even inside a webview in a hybrid app. If we have an app with both native controls and webviews, we can switch back and forth between them. This is exactly what takes place in ios_hybrid.rb, a test of a hybrid app (note the app capability being used) where we make use of the context commands:

  • @driver.available_contexts: get a list of available contexts
  • @driver.set_context(context): set a context
  • @driver.current_context: get the current context
  • @driver.switch_to_default_context: go back to the default context

Basically what we do is get a list of the available contexts, then set our current context to be the webview one (below, the string "WEBVIEW_1"). At this point we can run the exact same test we did before with Safari, even though we are automating a webview in a custom app we created. The code is as follows:

describe "when I go to Google" do
    it "should be able to search for Sauce Labs" do

        @driver.available_contexts  # ['NATIVE_APP', 'WEBVIEW_1']
        @driver.set_context "WEBVIEW_1"
        assert_equal @driver.current_context, "WEBVIEW_1"

        @webdriver = @driver.driver ""

        search = @webdriver.find_element :name, "q"
        search.send_keys "sauce labs"
        search.send_keys :enter

        # allow the page to load
        wait { assert_equal "sauce labs", @webdriver.title[0..9] }


Running the test is as easy as:

$ ruby ios_hybrid.rb

Notice that in this example, we've hard-coded the context name into our test. In more complex examples, we may want to loop through the available contexts and find one that matches our criteria (say for example the first non-native context). But this is basically the idea---you have native and webview contexts that you can switch between in the same test flow. This flexibility is a powerful and natural way to work with all kinds of apps, including hybrid and mobile apps.

Wrapping Up

In this guide we've explored Appium's foundations, specifically with reference to Ruby. Appium is an extremely flexible platform for mobile automation, allowing you to work with any language or test framework you're familiar with. We picked Ruby and Minitest, but we could have chosen Ruby and Cucumber, or Python and Nose, or Java and TestNG. We also went in-depth with some of the most commonly-used features of an Appium automation session, before discussing the more-complex gestures API and automating webviews within your apps.

Appium has a lot more to it, and so for next steps, you might want to check out the following resources:

Happy testing!

[special thanks to Isaac Murchie and Jonah Stiennon who provided drafts of some of the content presented here]


Verified by