Sunday, June 11, 2017

Lesson learned from Google Test Automation Conference 2016

I took me half a year to get to GTAC 2016 videos, but it was again absolutely worth. Engineers from the top companies presenting about challenges they had overcome. IMHO most of regular IT conferences talks have either solely a marketing purpose without any given value or are too beginner level oriented. GTAC is different! That is why I thought it might be again worth to write down notes from this year. Someone (similar to me - 6 years experience in Test Engineering) could see it in 5 minutes, and decide which talk is suited for him more easily.

  • Evolution of Business and Engineering Productivity
    • (34:49) When duplication is technical dept and when it can be a benefit?
      • flexibility, competition, collaboration when duplication is done in organized way, when it is unnoticed becomes dept
    • (37:22) Metrics and Measurements, e.g. how long it takes my code goes to production? Senior managers have to support this, and develop quarterly goals for employees based on this metrics.
    • (40:12) Test and Release strategy 2.0: continuous deployment, canary testing (testing during canary release), production monitoring, …
    • Build, test, release, repeat will not be sufficient in near future
    • Machine learning to find out which tests are completely useless, so they do not need to be run.
    • (52:20) Automated vs manual test ratio within Google Ads: 95%:5%
    • (53:00) Other metrics at Google: How long developers spend with running tests before submitting their change (a.k.a presubmit latency)?
    • 10 days in general for getting my change into production
  • Automating Telepresence Robot Driving
    • Telepresence - refers to a set of technologies which allow a person to feel as if they were present, to give the appearance of being present, or to have an effect, via telerobotics, at a place other than their true location [https://en.wikipedia.org/wiki/Telepresence].
    • Beam Telepresence System - https://suitabletech.com/beampro/
    • Beacon
    • (19:12) LiDAR - measure distance with laser
    • (20:50) - Hardware stack for testing of such robot in lab
      • Beam robot with modifications for Lidar scanning, Beam charging Dock, Hokuyo Lidar, NUC small form computer
    • Difficult to find orientation of symmetric object, so make it asymmetric but in a way that it does not affect test (do not change weight that much etc.)
    • (22:08) Lab room considerations have - to be isolated (also for safety cautions, robot has 100 pounds), lighting, flooring
  • What’s in your Wallet?
    • Galen - automate test of look and feel for responsive websites
    • Hygieia - a single, easy to use dashboard to visualize near real time status of entire software delivery pipeline
  • Using test run automation statistics to predict which tests to run
    • (8:55) Which tests not to run?
      • 100% successful, during last month, have > 100 test runs, those who run on all branches
      • Key point: disabled only on trunk, enabled on branches from which merges go to trunk, so basically, when they fail during merge process, they are again enabled, and run at least for another month
      • They were able to save about 50% build time.
  • Selenium-based test automation for Windows and Windows Phone
    • Winium.Mobile - something apart from Appium support for Windows on mobile devices
    • Winium.Desktop automation - opensourced, WPF, WinForms, any accessible app
  • The Quirkier Side of Testings
    • Funny one, must see :)
  • ML Algorithm for Setting up Mobile Test Environment
    • (09:50) Machine Learning algorithm to choose devices for test lab
      • Decision tree, random forest classifier
  • “Can you hear me?” - Surviving Audio Quality Testing
    • (06:58) Audio software testing pyramid
    • (16:40) POLQA algorithm for testing of audio quality. The inputs for this alg. are a reference audio file, and a recorded one. The result is a Mean Opinion Scale result (MOS), which is a grade from 0 to 5.
    • (18:08) Frequency analysis - it identifies actors in the audio recording. Each person speaks with different frequency.
    • (18:51) Speech presence - Finds out regions in the recording where speech was given.
    • (19:01) Amplitude analysis - Verify some speakers are not too loud or not too silent
    • (19:40) Live demo of web service which employs those algorithms
  • IATF: An new Automated Cross-platform and Multi-device API Test Framework
    • (21:25) Test steps sequence diagram for testing communication between two clients connected to a server (via WebRTC protocol)
  • Using Formal Concept Analysis in software testing
    • Can be used for finding dependencies among method parameters, in the form of implications
    • (14:27) Can be used for analysis of test report. Nice example
    • Lattice usage analysis is equivalent to finding most common descriptions of failed tests. In big systems lattice is a good representation for finding similar functionality.
    • Possible extension: ML to find out possible reason why some test failed.


  • How Flaky Tests in Continuous Integration: Current Practice at Google and Future Directions
    • SLA for dev, time he/she commits and gets answer = 3 hours in general
    • Not every change triggers right away test jobs, ⅓ does
    • With ML they can be 90% sure that some test is flaky, and they do not have to rerun it 10 times as usually
    • (14:10) how to identify that tests are flaky, patterns, features, correlations
  • Developer Experience, FTW!
    • Firebase test lab for Android devices,  Espresso, Robotium, or UI Automator 2.0
    • Espresso test recorder available in Android studio
    • (55:29) Firebase test lab will maybe in the future would be able to use real user actions to test the application


  • Docker Based Geo Dispersed Test Farm - Test Infrastructure Practice in Intel Android Program
    • Release and deliver test suites in the way of docker image


  • OpenHTF - The Open-Source Hardware Testing Framework
    • Test harness OSS Python library with Web GUI
    • Plugins for sensors, platforms, chips… any other hardware stuff. Currently not many plugins available.


  • Directed Test Generation to Detect Loop Inefficiencies
    • Redundant traversal of loops performance issue
    • Toddler: detecting performance problems via similar memory-access patterns
    • Glider - suggested approach to address redundant traversal
    • Implemented in Soot bytecode framework


  • Need for Speed - Accelerate Automation Tests From 3 Hours to 3 Minutes
    • Enablers:
      • dedicated environment, saved 57 minutes
      • empty DBs instead of shared DBs, saved 34 minutes
      • simulate dependencies (stub external dependencies), saved 24 minutes, but made tests more stable
      • Moved to containers - slowed the operation, as they did lot of IO operations
      • Run databases in memory - 4 minutes saved
      • Do not clean test data, when you have data in containers, once tests ends, container disappear, so no need for this, 15 minutes saved
      • Run tests in parallel, everybody starts with this, but one should end with this step, 41 min saved. One has to find the right number of threads, too many threads can slow things down.
      • Equalize workload, not every thread executed equal number of test cases
      • By vertical scaling (RAM and CPU) they were able to run in 1:38 min
      • They want to go below one minute by scaling horizontally


  • ClusterRunner: making fast test-feedback easy through horizontal scaling
  • Integration Testing with Multiple Mobile Devices and Services
    • Most frameworks are for single-device, when E2E testing challenges may come up: synchronize steps between multiple devices, large range of equipment - attenuator, call box, power meter, wireless AP
    • Mobly - OSS Python Google library, used to test Android, controls a collection of devices/equipment in a test bed (isolated mobile devices, network switch, IoT, etc)
    • centralized vs decentralized way of executing/dispatching test logic. Mobly is centralized - they found it more easy to debug
    • (18:24) Cool demonstration - two phones on watch, phone A gives voice command to watch, watch initiates a call to phone B, phone B gets call notification
    • Similar frameworks: openHTF, Firebase
  • Scale vs Value: Test Automation at the BBC
    • They become overwhelmed by manual regression tests > BDD to define with different stakeholders what to automate > separate team to ensure devices for testing are available, inventory, status of devices > test lab (lot of smart TVs, test lab in fire corridor :D)
    • PUMA - plan to adapt framework broadly within the company
      • Prove core functionality, automated checks for the core value of your product or system. Regularly audited to combat bloat
      • Understood by all - everyone cares, anyone can execute, visibility to all
      • Mandatory - part of delivery pipeline, any fail check stops the build
      • Automated
    • Whatever framework you use, you need to step back and see what value it brings to you: only important tests should run on real devices, etc
  • Finding bugs in C++ libraries using LibFuzzer
    • What to fuzz: anything that consumes untrusted or complicated inputs: parsers of any kind, media codecs, network protocols, crypto, compression, compilers and interpreters, regular expression matchers, databases, browsers, text editors/processors, OS kernels, drivers, supervisors, Chrome UI
    • How to fuzz: generation based fuzz or mutation based fuzz or guided mutation-based
    • Mutation: e.g. bit flipping
  • How I learned to crash test a server
    • They programmed outlet into which you can ssh, and turn on/off any of the socket
    • Crash virtual machines vs crashing physical machines, both need to be done
    • Virtual machines: from host, single command both for KVM and VMWare based
    • BIOS has setting to restore on AC power loss
    • On Windows there is utility bcedit, by which you can stop the prompt (Start Windows Normally, ...) after an abrupt Windows restart
    • They did not find a systematic way how to crash Windows by internal command (how ironic? :D)