I Know Nothing in Kotlin

When the Greek philosopher Socrates professed that “I know only one thing– that I know nothing,” he wasn’t exactly professing ignorance. It was an ancient formulation of the Dunning-Kruger effect. He had discovered that the more he learned, the wider the expanse of human knowledge seemed and the less that it seemed he knew.

A fool, on the other hand, might believe that they’re “not smart, but genius….and a very stable genius at that,” and that they know “only the best words.”

In this sense, it’s a good thing to know nothing. It’s also true in Kotlin.

During a recent code pairing session, I discovered that in Kotlin, there are numerous developers who have written code in the language for many years. And yet, they still don’t know Nothing.

Our pairing session became a big fuss about Nothing. We were working on some Repository code in our shared Kotlin Multiplatform Mobile (KMM) module of our Meetup for Organizers app. In this code, there was a repeated block at the end of every repository call to handle network, DNS, and timeout errors. We decided it would make sense to refactor this block into a function as we used it so often.

The repeated block in question handled a catch block. It had to re-throw any CancellationExceptions to allow cooperative cancellation of coroutines. Then we had to wrap any other exception and re-throw it as a custom exception type so that we can easily tell iOS a complete list of all types we intend to throw in a @Throws declaration. This allows exception handling and to avoid crashing when our functions are called by Swift code. In essence, this code block always threw an Exception.

The pairing session became quite funny like the Laurel and Hardy “Who’s on First” comedy routine, but you could tell there was Nothing more frustrating, too. It went something like this:

“Why am I getting this red text at the call site?”

“The return type is incorrect because you always throw an exception. You need to return Nothing.”

“I already am returning nothing.”

“No, you’re returning Unit. All Kotlin functions return Unit if you don’t declare a type. You need to return Nothing.”

“I don’t understand. I’m returning nothing from the function.”

“Just trust me. Add it to the return type of the function: Upper case N- lowercase o…”

“I’m so confused.”

It turns out that my colleagues, who had been coding in Kotlin for many years, had never been exposed to Nothing.

Nothing is weird. Nothing is esoteric. But Nothing is quite so useful in these rare cases.

I know Nothing.

“Nothing has no instances. You can use Nothing to represent ‘a value that never exists’: for example, if a function has the return type of Nothing, it means that it never returns (always throws an exception).”

— Kotlin Documentation

The official definition only seems to raise further questions for newcomers to the language.

Why can’t I just return Unit?

Returning Unit would be lying to the compiler and the compiler will miss opportunities to do the right thing. For example, the code in our repository wouldn’t compile because we promised to return a type that wasn’t Unit from each function. Returning Nothing allows our parent function to compile and return the correct type.

What benefits do I get from Nothing?

You do, indeed, get something for Nothing. For one, not using Nothing misses out on correctness benefits like highlighting code as dead and unreachable if we run any code after our function that returns Nothing. Your IDE knows quickly that it stops at Nothing to show the right warnings. If we wrote the same code in Java, there would be no warnings.

Generics are also an area that is good for Nothing. If you want to create an object inside a templated sealed class that doesn’t have any type, its type is Nothing. Any other solution won’t compile.

// credit to Allan Caine @ Kotlin Academy
sealed class LinkedList<out T>  {

    data class Node<T>(val payload: T, var next: LinkedList<T> = EmptyList) : LinkedList<T>()

    object EmptyList : LinkedList<Nothing>() 
}

Nothing doesn’t only apply to functions where you always throw. Nothing is valuable for early returns, such as replicating the Swift guard() function in Kotlin:

// credit to Zac Sweers
inline infix fun <T> T?.guard(block: () -> Nothing): T {
  return this ?: block()
}

fun sweetNothings(input: String?) {
  val value = input guard {
    show("input is null!")
    return
  }
}

Why doesn’t Kotlin just determine automatically that my function returns Nothing and handle this for me?

As to my third and final question, automatically determining that a function returns Nothing would cause a number of problems. Both the compiler and IDE checks would take longer to run as they need to test every branch. Automated type inference would produce some surprising types that could lead to later bugs.

Finally, we should tell the creators of Kotlin, “Thanks for Nothing!”

If you enjoyed my blog post, you ain’t seen Nothing yet. Subscribe to my RSS and keep reading!

Remember, Remember, Jetpack Compose

During the first Kotlinconf in 2017, I asked Google for some kind of declarative user interface (UI) framework for Android. Specifically, I approached Stephanie Cuthbertson and Yigit Boyar at that San Francisco conference. I told them I liked the idea of Anko Layouts, the defunct framework developed by the Kotlin team at Jetbrains, but it really needed a solution with first party support and first class execution.

I’m sure many Android developers asked for something similar, just as many had asked Google for first-party Kotlin support in Android before they announced it at Google I/O 2017.

Today, Jetpack Compose not only delivers on my original request. It goes far above and beyond it. It solves the fragmentation problem of many Android OS versions showing UI widgets with different appearances. It fixed the issues of writing complex Adapters with so many kinds of rows for RecyclerViews. It even promotes stateless, functional UI and unidirectional data flow.

We’ve been working full-time in Jetpack Compose for almost six months now on our new Meetup for Organizers app and we love it.

Why I can’t remember()

However, I would like to focus in this blog on one particular, easily misused part of Jetpack Compose… the remember() function.

Here’s a TL;DR (too long; didn’t read) summary of what I’m about to write about the remember() function:

  1. remember() as little as possible. Just keep the exact pieces of data you really care about in Composable functions. The rest can be passed into your function.
  2. If you don’t want to lose state on rotation, use rememberSaveable(). Write a Saver only if necessary. @Parcelize can help.
  3. If the data is so large it could grow to hit a TransactionTooLargeException, you should persist that data and rememberSaveable() the keys instead.
  4. If you care about not losing data when the user leaves and returns to that screen, you should persist the data from the ViewModel and restore it when needed.

Ever since the beginning of Android development, memory and persistence has easily been one of the hardest challenges. Over the years, so many developers have had their apps break, crash, or forget data by screen rotation or by memory eviction. While Jetpack Compose is amazing, it doesn’t really eliminate these challenges.

One of the first tools Android developers learn when they start to explore Compose is the remember() function. And it’s likely one of the first they should forget. remember() has value when saving state in some notable circumstances, but it’s not great if your state is user input. I’ve seen very senior developers make basic errors when it comes to using remember().

remember() exists to save state in Composable functions between recompositions. However, it has many issues.

First of all, remember() doesn’t handle some configuration changes like rotation. If you rotate the screen, all remembered data on the screen is lost. There’s an easy fix for this called out immediately in the Compose docs. You can instead use rememberSaveable().

Our saving grace

Now, you’re probably thinking this whole blog is advocating that you should simply use rememberSaveable() and all problems are resolved, but that’s not my conclusion at all. Once you start using that new function, your solution can now cause three new problems.

If your state is complex, like a data class, sealed class, or class, then you now have to write a Saver for that class to explain how to bundle it. The quickest option is the @Parcelize annotation, but that might not be available in some circumstances.

Then there’s the TransactionTooLarge exception to contend with. Anyone who’s been writing Android apps for a long time knows that if you put too much into a Bundle, then save it to instance state to restore, you will run into this common crash. Anything you save with rememberSaveable() should be small and the entire Bundle should be small in aggregate.

Third, if your Composable function goes off of the screen, even for a moment, everything you remembered with rememberSaveable is lost. This is easy to encounter, for example, if you use Jetpack Compose Navigation and load a new route for a sub-screen. Let’s say your settings page needs a child page. When you return, your composable forgets everything you thought you remembered.

A pretty state of affairs

The solution to the last two problems is persistence. Ideally, you should be saving to your ViewModel or to a database, such as Room or SqlDelight (for Kotlin multi-platform projects). The only values which belong in rememberSaveable() when the state is very complicated are keys to query your database to retrieve the persisted data.

When you think about using the remember() function, just remember…

Do you care if data is lost on rotation? You shouldn’t use remember().

Is the data too large to store in an instance state? If so, you should be persisting data instead of simply using rememberSaveable().

Never forget. While Jetpack Compose is amazing and it makes our jobs as developers so much better, it doesn’t absolve us of concerns about persisting state.

Don’t be a cabron

Yesterday, I had a magical experience scuba diving the reef at La Parguera. It was gorgeous and the dive company generally did a great job on safety and running the boat.

On the other hand, not everything was peachy. I couldn’t tell what one of the guides’ hand signals meant and assumed we were doing a safety stop, so he gave me the signal to head back to the boat.

When he came out of the water, I heard him refer to me as a “cabron” or idiot to his colleague. I wonder how many times he’s made the same mistake… Assuming because someone isn’t as experienced as his over 10,000 dives completed and not as comfortable managing full time Spanish… that he can make foolish insults in their presence without being understood.

I’d be willing to bet several people did understand over the years and chose not to say how annoying his behavior is.

Always remember: Don’t be a “cabron” and underestimate other people. They will surprise you.

For example: I remember how I spent every day at one job walking past the receptionist. One day while talking, I suggested that she too could handle software engineering. That it wasn’t that hard to learn despite what everyone said. Today, she’s a highly successful, independent software engineer.

How many engineers had walked past her and assumed that “receptionist” was an identity or her fate rather than just one temporary job that made sense in her current place in life?

Just because someone is less experienced or has an unimpressive job title, don’t be a “cabron.”

The bug that Stole xmas

I loved working at Mozilla for the almost two years that I spent there. But even a great job has hard days. The weeks around Christmas of 2018 were a stressful nightmare.

I’d been working as the senior engineer on the Firefox Focus browser for Android for a few months by then. Releases had been mostly uneventful. This story is about when that stopped.

Up to this point, our releases had been quiet because we used A/B experiments to roll out new features to Nightly and Beta users before anything reached our millions of Google Play users in production. We would watch dashboards to ensure that crashes, Application Not Responding (ANR) errors, page load time, reviews, and the number of active users over time were all trending in the right direction as we progressively rolled out new releases and would roll back at any sign of trouble.

Right before Christmas 2018, our long chain of pleasant releases came to an end.

As I was arriving back from a week-long vacation in Florida, we released Focus 8.0.4 to production. In fact, I believe I may have pushed the release from my economy seat on the flight using in-flight WiFi because no one else would be around to watch the rollout.

At first, everything seemed fine. This was because our two crash reporting tools, Sentry and Socorro, did not show that WebViews were crashing inside of Chrome code. These Native Development Kit (NDK) crashes were not caught by our tools, so they only appeared on the Google Play Developer Console.

To compound matters, the rest of my team went away on vacation then. I was alone and soon very bewildered.

We would’ve been fine if we’d used a crash reporting solution that reports all NDK crashes– not simply crashes in our own native code.

Lesson #1

After rolling out the release to 1% of our users, nothing was noticed. We discovered these crashes only in Google Play’s ANRs and Crashes section of the Google Play Console as our app was deployed to the first 10% of our users. This was surprising because I’d never before used any crash reporting solution that did not show data about crashes in C++ NDK code.

Having three crash reporting solutions with three separate and exclusive data sets split between them made finding our problem much harder. Google Play ended up picking up the crashes which the other two solutions did not handle. We would’ve been fine if we’d used a crash reporting solution that reports all NDK crashes– not simply crashes in our own native code.

When we became aware of the crashes, we still had very little data to go on. Google Play showed three primary crashes in C++ code. They were all inside the Chrome or WebView packages. Each of the three stack traces consisted of nothing more than the line “signal 5 (SIGTRAP), code 4 (TRAP_HWBKPT)” and another line pointing to the file name of the Chrome or WebView APK.

To be clear, there was no stacktrace. There was no native crash tombstone. There were no known steps to reproduce. I was flying blind and I was alone in my investigation. There was no one online but one Quality Assurance (QA) engineer and my own self to figure out what broke and how to produce the crash.

This wasn’t the kind of error that would produce results on StackOverflow or any of the usual sites which helps young engineers appear briefly superhuman simply by searching Google and copying the first answer. Crashing on SIGTRAP clearly pointed to native C++ issues, but it wasn’t happening in our own C++ code or the trace would’ve shown up in Socorro– our web engine’s native crash reporter. Instead, these errors happened inside the Chrome-based WebViews themselves.

To compound matters, we had just rolled out our new GeckoView engine to more users in an A/B test. This was a big deal. Firefox Focus had been built originally on Google’s WebView technology to quickly launch a prototype and to test our product hypotheses. However, we had huge plans to launch user privacy features which required a custom engine.

GeckoView was the culmination of years of work catching up to and sometimes even surpassing Google on performance as well as offering new features which could differentiate our browsers into the future. My job up to this point had been to ensure a smooth rollout of this new technology. Any potential issues had to be quickly caught and fixed.

Changing one variable at a time reduces the complexity of troubleshooting.

Lesson #2

When these errors appeared, they were affecting about 7-8% of all browser sessions. This meant we were crashing a lot and were likely losing many thousands of dedicated users. Most of the previous apps I’d worked on maintained far less than 0.5% session crash rates after a production release.

If we’d rolled out the new Firefox Focus 8.0.4 release further apart from increasing the percentage of GeckoView users, my investigation would’ve been far easier. Rolling out our release around the same time we rolled out our A/B test to more users meant we couldn’t tell which was responsible for the crashes. Changing one variable at a time reduces the complexity of troubleshooting.

Don’t panic. Software investigations go faster when you take your time and think methodically.

Lesson #3

There were reasons to worry that the GeckoView engine change caused these crashes. On first install, our app would always use WebView and would only switch engines after downloading a new A/B configuration file and then restarting the app from scratch after a reboot or upgrade. This made sense to avoid errors related to switching engines midstream that would invalidate our experiment. There was a lot of concern that our engine switching code might introduce issues.

There’s a problem in Computer Science where the initial gut reaction of good engineers is that they themselves must have made a mistake somewhere. My first hypothesis was that my engine switching code was causing the WebView to crash in the process of switching new users to the GeckoView engine. This may have been slightly neurotic. There was no evidence in the code or the error to indicate that my switching code was at fault–only the knowledge that my code was finally being exercised by hundreds of thousands of users for the first time.

I should have been able to quickly abandon my initial hypothesis, even without a stack trace, simply by examining the code. When troubleshooting, it’s really important you don’t panic. Software investigations go faster when you take your time and think methodically.

To compound matters further, there had been an urgent Google Chrome security update rushed out the door in the same week. I had to scour the Internet to ensure that update was not the cause.

Sadly, it took days to get to the heart of the issue. I had to first give up on discovering steps to reproduce the issue or to hear from a QA engineer or end user how to do so. I knew it was unlikely there would be any quick fix without a stack trace and my investigation led there.

I was able to quickly abandon Stack Overflow and Internet searches as a dead end.

I focused on how I could obtain a stack trace. It seemed likely I’d need to contact Google, but I’d seen with past Android issues how it could sometimes take many years to get an engineer response and fix to a real bug in the Android framework. If I went through official channels, it seemed highly unlikely I would get a response within a useful timeframe.

I ended up reaching out to the #webview-wtf channel in the Android Study Group Slack. This is an invite-only Slack service for professional Android developers only. It turns out that channels ending in “-wtf” are code for channels where at least some of the maintainers of the software are present in the channel. This meant I could try to get the attention of Chrome developers directly.

I reached out on December 26th and felt so lucky to get a response from a Chrome developer two days later. Better yet, it was useful. Amazingly, he was able to dig up the specific crash reports for the WebViews in our app. It seems that the callbacks we set for Chrome WebViews did not return logs or tracebacks of why they crashed. Even though the code was written in Kotlin, the error was caught by Chrome’s NDK code and it caused a SIGTRAP.

Threading is hard and concurrent access is a minefield. Functional programming isn’t just for Haskell programmers. Purity even makes a difference in imperative code.

Lesson #4

As soon as I had a traceback of the exception that caused the crash, the fix was fairly easy. Our blocker tracking code for the WebView engine had just been replaced by Mozilla’s Android Components to have a new, unified backend. That update used the Kotlin Observable property delegate, but the lambda operated directly on the field instead of local variables. This is bad because it’s not thread-safe.

The root cause of the crashes was multiple threads were both changing and reading the quantity of trackers blocked. When we tried to read the last tracker blocked, another thread had emptied the list before we could. This crash was inside a Chrome callback, so Chrome got the stack trace.

Here is the core part of the fix I wrote:

    var trackersBlocked: List<String> by Delegates.observable(emptyList()) { _, old, new ->
        notifyObservers(old, new) {
            if (new.isNotEmpty()) {
-                onTrackerBlocked(this@Session, trackersBlocked.last(), trackersBlocked)
+                onTrackerBlocked(this@Session, new.last(), new)
            }
        }
    }

This was possibly the hardest bug hunt I’ve ever had to deal with. I bring it up not to shame the otherwise smart developer who made the mistake and likely learned from it, but because it makes a good Halloween horror story for Android developers everywhere. There’s also some useful lessons to learn.

Threading is hard and concurrent access is a minefield. Functional programming isn’t just for Haskell programmers. Purity even makes a difference in imperative code. In this case, it even could have prevented an epic horror story that ruined my holidays.

Here’s a summary of the issue.

A dispatcher of many threads

Kotlin Coroutines have been almost unanimously received with applause by the Android software development community. However, there are some significant issues I have noticed in practice that are still unknown to most teams.

This week, one major issue was finally resolved.

Dispatchers.Main

Dispatchers.Main is the Kotlin Coroutines thread dispatcher for accessing Android’s main thread. Until recent improvements, the very first time this call is used, most apps experience a delay of up to a quarter second. Blocking the main thread pauses drawing the user interface and is enough to cause a noticeable lag to users.

It takes 16ms to draw one frame on Android. Blocking the main UI thread for 250ms skips fifteen frames at 60fps or thirty frames at 120fps. This is in addition to any other slow, blocking operations that might occur during your application’s initial startup.

This blocking operation serves no purpose. The Main Dispatcher was written using a ServiceLoader call. This choice resulted in a wasteful checksum operation being performed on the entire JAR file inside the APK.

I was able to workaround this problem several months ago on Firefox Preview by using a pre-release version of Google’s R8 code stripper along with custom rules. This wasn’t ideal. We were potentially subjecting our app to early stage bugs and were unlikely to get the same level of support for an unreleased version. We made this sacrifice because a quarter second startup time reduction was worth the cost.

At long last, there’s now a solution in a stable version of the Kotlin coroutines library itself. To take advantage of this fix, you must perform the following steps.

  1. Upgrade your version of Kotlin coroutines to 1.3.3 or higher in your build.gradle dependencies.
  2. Insert the required Proguard rules into your proguard-rules.pro file or whatever you’ve named it. See below.
  3. Ensure you’re using proguard-android-optimize.txt instead of the default, non-optimizing version.
  4. Ensure R8 is enabled. The R8 code stripper was introduced in Android Studio 3.3.0 and became default in Android Studio 3.4.0 or greater. It replaces Proguard’s code stripper, but should accept the same rules files. It can be configured in the gradle.properties file at the root of your application code.
# Add or replace these lines in your build.gradle file
# dependencies section
implementation org.jetbrains.kotlinx:kotlinx-coroutines-core:1.3.3
implementation org.jetbrains.kotlinx:kotlinx-coroutines-android:1.3.3
# Add these lines to proguard-rules.pro

# Allow R8 to optimize away the FastServiceLoader.
# Together with ServiceLoader optimization in R8
# this results in direct instantiation when loading Dispatchers.Main
-assumenosideeffects class kotlinx.coroutines.internal.MainDispatcherLoader {
    boolean FAST_SERVICE_LOADER_ENABLED return false;
}

-assumenosideeffects class kotlinx.coroutines.internal.FastServiceLoader {
    boolean ANDROID_DETECTED return true;
}
# Add these lines to your application build.gradle file
# or simply change the getDefaultProguardFile line

android {
   buildTypes {
      release {
        shrinkResources true
        minifyEnabled true
        proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
      }
   }
}
# Put this line somewhere in gradle.properties
android.enableR8=true

If you perform the above steps, gradle sync, and then perform a release build with optimization enabled, you should see a significant improvement in your cold start time versus previous release builds.

There are many options to quickly see this improvement with a flame chart. You could use Google’s Traceview or a dedicated, third-party tool like Nimbledroid.

Google’s new Jetpack Benchmark tool seems useful for UI code, but it is not recommended by its creators for measuring cold application startup time.

Update: I was planning a part two of this blog, but decided the second issue wasn’t as interesting, nor as clearcut as I expected for a blog post.

Colin, Now Sly As a FireFox

Colin Lee from ColinTheShots LLC just joined Mozilla as the newest Senior Android Engineer working on Firefox Android products. I’ll be changing my professional consulting site to function as a blog. Since my job is open source software and no longer involves secrecy, I’m excited to be able to speak about my experiences developing new and amazing Android products.

Initially, I’ll be writing primarily Kotlin code in projects mixed with existing Java source.

In my first weeks on the job, I built a switcher for changing browser rendering engines and a new nightly build process integrating a hot, new nightly icon for the Firefox Focus browser. You can see that icon above.

Look forward to more posts about Android development challenges and learning.