Big O Primer

This is what happens when you don’t know Big O

Introduction

Big O is all about saving time and saving space, two resources that computer algorithms consume to do repetitive jobs (like sorting strings, calculating sums, or finding primes in haystacks). Big O analysis is required when data points grow very numerous. If you need to sort thousands, millions, billions, or trillions of data points then Big O will save you time and/or space, help you pick the right algorithm, guide you on making the right trade-offs, and ultimately be enable you to be more responsive to users. Imagine if Google Search told users to grab a cup of office while searching for the best coffee shop… madness would break out!

You could argue that Big O analysis is not required when filling a few dozen rows with fetched data in a web view. But it’s a good idea to understand why a naive user experience can create a backend scaling nightmare. Product Managers, Designers, and Frontend Devs will want to understand and be able to discuss why fetching a fresh list of every customer from the database for every page view a bad idea. Even with caching and CDNs Big O leads to better system design choices.

Properties

There are three points of view on algorithm complexity analysis:

  • Big O: upper bounds, worst case estimate.
  • Big Ω: lower bounds, best case estimate.
  • Big Theta: lower and upper, tightest estimate.

The distinctions are literally academic and generally when we let’s do a Big O analysis of algorithm complexity we mean let’s find the tightest, most likely estimate.

There are two types of algorithm complexity analysis:

  • Time complexity: an estimation of how much time source code will take to do its job.
  • Space complexity: an estimation of how much space source code will take to do its job.

Most of the time we worry most about time complexity (performance) and we will happily trade memory (space) to save time. Generally, there is an inverse relationship between speed of execution and the amount of working space available. The faster an operation can be performed the less local space your source code has to work with to do it.

CPUs have memory registers that they can access quickly but only for small amounts of data (bytes and words). CPUs also have access to increasing slower but larger caches for storing data (L1, L2, L3). You, as a coder, usually don’t worry about CPUs, registers, and level 1, 2, or 3 caches. Your programming language environment (virtual machine, compiler) will optimize all this very low-level time and space management on your behalf.

Modern software programs, which I will call source code, live in a virtual world where code magically runs and memory is dutifully managed. As a coder you still occasionally have to step in and personally manage memory, understand what data is allocated to the stack versus the heap, and make sure the objects your source code has created have been destroyed when no longer needed.

Modern software applications, which I will also call source code, live in a virtual world where apps magically run and storage is autoscaled. As a system designer you’ll have to choose your virtual machines and storage APIs carefully—or not if you let managed app services and containerization manage all your code and data as a service.

Usage

Big O will help you help you make wise choices with program design and system design. Big O, like other dialects of mathematics, is abstract and applies to wide range of software phenomenon. Using Big O to optimize your source code means faster web pages, apps, and games that are easier to maintain, more responsive to the users, and less costly to operate.

Big O reminds me of Calculus and Probability! It’s all about estimating the rate of change in how your source code processes data over time. Will that rate of change get slower with more data? That answer is almost always yes! Big O helps you identify bottlenecks in your source code, wasted work, and needless duplication. Big O does this through a formal system of estimation, showing you what your source code will do with a given algorithm and a given set of data points.

There are even more bottlenecks that will slow down your app even if your algorithms are optimized using Big O. The speed of your CPU, the number of cores it contains, the number servers, the ability to load balance, the number of database connections, and the bandwidth and latency of your network are the usual suspects when it comes to poor performance in system design. Big O can help with these bottlenecks as well so remember to include these conditions in your circle of concern!

Big O estimated are used to:

  • Estimate how much CPU or memory source code requires, over time, to do its job given the amount of data anticipated.
  • Evaluate algorithms for bottlenecks.

Big O helps engineers find ways to improve scalability and performance of source code:

  • Cache results so they are not repeated, trading off one resource (memory) for another more limited resource (compute).
  • Use the right algorithm and/or data structure for the job! (Hash Table, Binary Tree, almost never Bubble Sort).
  • Divide the work across many servers (or cores) and filter the results into a workable data set (Map Reduce).

Notation

Big O notation is expressed using a kind of mathematical shorthand that focuses on what is important in the execution of repeated operations in a loop. The large number should be really big, thousands, millions, billions, or trillions, for Big O to help.

Big O looks like this O(n log n)

  • O( … ) tell us there is going to be a complexity estimate between the parentheses.
  • n log n is a particular formula that expresses the complexity estimate. It’s a very concise and simple set of terms that explain how much the algorithm, as written in source code, will most likely cost over time.
  • n represents the number of data points. You should represent unrelated, but significant, data points with their own letters. O(n log n + x^2) is an algorithm that works on two independent sets of data (and it’s probably a very slow algorithm).

Big O notation ignores:

  • Constants: values that don’t change over time or in space (memory).
  • Small terms: values that don’t contribute much to the accelerating amount of time or space required for processing.

Big O uses logarithms to express change over time:

  • A logarithm is a ratio-number used to express the rate of change over time for a given value.
  • An easy way to think of rate-of-change is to image a grid with an x-axis and a y-axis with the origin (x = 0, y = 0) in the lower left. If you plot a line that moves up and to the right, the rate of change is the relation between the x-value and the y-value as the line moves across the grid.
  • A logarithm expresses the ratio between x and y as a single number so you can apply that ration to each operation that processes points of data.
  • The current value of a logarithm depends on the previous value.
  • The Big O term log n means for each step in this operation the data point n will change logarithmically.
  • log n is a lot easier to write than n – (n * 1/2) for each step!
  • Graphing logarithmic series makes the efficiency of an algorithm obvious. The steeper the line the slower the performance (or the more space the process will require) over time.

Values

Common Big O values (from inefficient to efficient):

  • O(n!): Factorial time!
    Near vertical acceleration over a short period of time. This is code that slows down (or eats up a lot of space) almost immediately and continues to get slower rapidly. Our CPUs feel like they are stuck. Our memory chips feel bloated!
  • O(x^n): Exponential time!
    Similar to O(n!) only you have a bit more time or space before your source code hits the wall. (A tiny bit.)
  • O(n^x): Algebraic time!
    Creates a more gradual increase in acceleration of complexity… more gradual than O(x^n) but still very slow. Our CPUs are smoking and our memory chips are packed to the gills.
  • O(n^2): Quadratic time!
    Yields gradual increase in complexity by the square of n. We’re getting better but our CPUs are out of breath and our memory chips need to go on a diet.
  • O(n log n): Quasilinear time!
    Our first logarithm appears. The increase in complexity is a little more gradual increase but still unacceptable in the long run. The n means one operation for every data point. The log n means diminishing increments of additional work for each data point over time. n log n means the n is multiplied by log n for each step.
  • O(n): Linear time!
    Not too shabby. For every point of data we have one operation, which results in a very gradual increase in complexity, like a train starting out slow and building up steam.
  • O(n + x): Multiple Unrelated Terms
    Sometimes terms don’t have a relationship and you can’t reduce them. O(n + x) means O(n) + O(x). There will be many permutations of non-reducible terms in the real world: O(n + x!), O(n + x log x), etc….
  • O(log n): Logarithmic time!
    Very Nice. Like O(n log n) but without the n term so it a very slow buildup of complexity. The log n means the work get 50% smaller with each step.
  • O(1): Constant time!
    This is the best kind of time. No matter what the value is of n is, the time or space it takes to perform an operation remains the same. Very predictable!

Analysis (discovering terms, reducing terms, discarding terms):

  • Look for a loop! “For each n, do something” creates work or uses space for each data point. This potentially creates O(n).
  • Look for adjacent loops! “For each n, do something; For each x, do something” potentially creates O(n + x).
  • Look for nested loops! “For each n, do something for each x”. This creates layers of work for each data point. This potentially creates O(n^2)
  • Look for shifted loops! “For each decreasing value of n, do something for each x”. This creates layers of work that get 50% smaller over time as the input is divided in half with each step. This potentially creates O(log n).

Yet Another Book Binder Update

Hey! Who remembers that comic book manger app I was writing a few months ago?

Not me! Actually I didn’t forget about Book Binder–I just smacked into my own limitations. I had to take a break and do a bunch of reading, learning, and experimenting.

And now I’m back. Look at a what I did…

Xcode storyboard

My problem with Book Binder was creating a well designed navigation view hierarchy and wiring it all together. And so I dug in and figured it out (with a lot of help from Ray Wenderlich and Stack Overviewflow. Thanks Ray, Joel, and Jeff!)

You’ll find the new codebase here: Comic Keeper

But let’s talk about view controllers, show segues, and unwind segues for a few minutes. In the image above I have tab bar controller as my root with three tabs. The 2nd and 3rd tab are trivial. It’s the first tab that is entertaining! I have a deep navigation view hierarchy with 8 view controllers linked by 15 show segues, 6 unwind segues, and 4 relationship segues.

What I like about storyboards and segues is Xcode’s visualization. It’s a nice document of how an iOS app works from main screen to deeply nested supporting screens. Once you learn the knack of it of it, control-dragging between view controllers to create segues is easy. Unfortunately creating a segue is the also the least part of the effort in navigating from one iOS view to another.

Given the number of screens and bi-directional connections I have in this foolish little app, navigation management consumes too much of my coding. And these connections are fragile in spite of all the effort UIKit puts into keeping connections abstract, loose, and reusable.

Let’s take a quick look at what have to do each time I want to connect a field from the EditComicBookViewController to one of the item picker view controllers:

  • Update EditComicBookViewController:prepare(for:sender:) method with a case for the new show segue. I need to know the name of the segue, the type of the destination view controller, and the source of the data I want to transfer into the destination. I have a giant switch statement to manage the transactions for each show segue. I did create a protocol, StandardPicker, to reduce the amount of boilerplate code generated by each show segue.
  • Update EditComicBookViewController unwind segue for each particular type of item picker I’m using. I have four reusable item pickers (edit, list, dial, and date) and four unwind segues (addItemDidEditItem, listPickerDidPickItem, dialPickerDidPickItem, datePickerDidPickDate). A function corresponding to each unwind segue is better than having one method for all show segues. But I still have conditionals that choose the EditComicBookViewController field to update based on the title I gave to the picker. This view controller/segue pattern is not really set up for reuse.
  • I have to create a show segue from the source view controller to the destination and an unwind segue from the destination back to source. This is all assuming these views are embedded in the same UINavigationViewController. Each segue needs it’s own unique ID and its pretty easy to confuse the spelling of that ID in the supporting code.

How could navigation be better supported in Xcode and UIKit?

First, I’d like to bundle together the show and unwind segues with the IDs and the data in a single object. I’m sure this exists already, as it’s pretty obvious, but Apple isn’t providing an integration for storyboards. Ideally, I would control-drag to connect two view controllers and Xcode would pop-up a new file dialog box to create a subclass of StoryboardSegue and populate it with my data. The IDs should be auto-generated and auto-managed by Xcode. That way I can’t misspell them.

Second, I’d like the buttons in the navigation bar to each trigger separate but standard events:

  • goingForward(source:destination)
  • goingBackward(source:destination)
  • going(source:destination)

Right now you have to build your own mechanism to track the direction of navigation in viewWillDisappear(). In the navigation hierarchy of every app there is semantic meaning to moving forward and backward through the hierarchy similar to but potentially different from the show/unwind segue semantics. Tapping the back button might mean oops! Get me out of here while tapping a done button might mean I’ve made my changes, commit them and get me out of here!

Third, I’d like Xcode’s assistant editor view to show the code for a segue when I click on a segue. The navigation outline and storyboard visualization is a great way to hop from view controller to view controller. While Xcode knows about the objects that populate controllers, it doesn’t navigate you to any of them. Clicking on a connection in the connections inspector should load the code for that connection in the assistant editor. Xcode does highlight the object in the storyboard but I want more.

As my apps get more ambitious and sophisticated I’m probably going to abandon storyboards like the Jedi Master iOS developers I know. That’s sad because I feel I’ve finally figured out how to wire-up a storyboard-based view controller hierarchy and I’m already leveling out of that knowledge as the Jedi Masters smile smugly.

FizzBuzz Still Hard… and Still Useless

FizzBuzz Xcode Playground

I hear that many of the applicants we interview still have a hard time with FizzBuzz and other simple examples of looping, testing, and printing integers. This was true in 2007 when Coding Horror wrote this a famous blog post on Fizz Buzz, true in 2010 when DanSignerman famously asked StackOverflow about it, and true in 2017 when Hannah Ray answered the same question again on Quora. And today in 2019 I watched a video by Lets Build That App that explained how do solve FizzBuzz using fancy features of Swift 5.

So, yes, by external and internal validation FizzBuzz is still hard for software developers in an interview context to perform on demand.

In a quiet room with a clear set of requirements FizzBuzz is no problem. But under the pressure of an interview where the spotlight is on every misstep on the whiteboard FizzBuzz becomes something like the Great Filter of Software Interview Questions—Maybe we have not found intelligent life out in the stars because alien civilizations can’t pass the job interview!

Filtering numbers (so you can print “fizz” on multiples of 3 and “buzz” on multiplies of 5) is great for making sure you understand how language features work and that division by zero is bad.

Filtering numbers was probably a meaningful interview question in the 1980s and 1990s, when I learned to code, because it was a major pain with Assembly, C, and C++ (but not LISP). Today’s languages like Java, C#, Swift, Kotlin, and Python take all the challenge out of FizzBuzz.

Doing FizzBuzz in Swift 5--too easy!

The new Swift 5 language feature, isMultiple(of:) doesn’t even crash when you give it a zero! Where is the fun in that? And who knows if this advanced Swift switch statement with assignments as case expression is executing in constant time or blowing up the stack?

(Somebody knows all these things but software development is so routine now as most VMs and IDEs have airbags and baby-bumpers around your code.)

And yet many software developer interviews crash and burn on the FizzBuzz question! I think it’s because as team leads and employers we are rooted in the past. More and more code is being written from well engineered frameworks and yet we act is if knowing everything from first principles at the start of a developers career is critical for success. It’s not.

A good developer matures over time and has to start somewhere. There are better tests than FizzBuzz for filtering integers and engineers. It’s best to look for potential, the ability to learn, work well with others, and passion for coding, when interviewing candidates!

The Cost of Doing Internet Business

Screenshot of Netscape Navigator browser

A couple decades ago the costs of putting up a website seemed really reasonable–especially as compared to pre-Internet media like books, newspapers, and encyclopedias. A lone webmaster toiling away in the wee hours of the night could get a site up and running with HTML and CGI by the break of dawn. Pages were static, served directly from a server’s hard disk, and JavaScript and CSS non-existent.

iOS 6 TableViewController example app

Mobile apps were in a similar state a decade ago. The costs were small as compared to full featured desktop and web applications and there wasn’t much a mobile app could do that a simple table view controller couldn’t do. A lone mobile dev toiling away in the wee hours of the night could get almost any app ready for release to any app store in a few weeks. Back then most mobile apps were under the constraints of low-powered processors, limited memory, and ephemeral battery life. These constraints required mobile apps to be more like single-purpose widgets than multifunction applications.

Today it’s a more complex world for web and mobile apps. We have seriously sophisticated tech stacks that include the Cloud, HTML5, and advanced mobile operating systems that target screens of every size and devices without any screens at all. We don’t create websites or apps, we build multi-sided networks with so many features that vast teams of developers, scrum masters, product managers, designers, data scientists, and service reliability engineers are required.

In the modern world of 2019 no developer works alone in the wee hours of the night.

This hit home to me when I met a talented Google Doodle developer with amazing JavaScript skills. This dev explained that thier code was not allowed to go into production without a dozen teams looking it over, testing it, and beating it up. Because, you know, scale, performance, and security. By time the Google Doodler’s code appeared on the Google Search home page it was unrecognizable.

In my own work as a software developer and software development leader I’ve seen the rise in complexity and cost hit web and mobile development hard. GDPR, PII, COPA, OWASP, and other standards have added hundreds of dollars of cost to the development of iOS and Android apps. So has the proliferation in diversity of mobile devices. Are these pocket super computers with super high-resolution screens, prosumer cameras, always-hot microphones, machine learning chips, GPS, motion sensors, and unlimited cloud storage still phones? I don’t think so. Some are as big as coffee table books and others can be strapped to a wrist. When they start folding up into intelligent origami I don’t think we can call them phones anymore.

If you are not a developer, imagine trying to writing a book for new kind of paper that can change its shape, size, and orientation at a reader’s whim. Sometimes this paper has no visual existence and the user can choose to listen to your writing instead of reading. This protean piece of paper has all sorts of sensors so you, as the author, can know in real time where you reader is and what your reader is doing (if your reader accepted the EULA). How do you write a book for paper like this? It refuses to be any one thing and requires you, as the author, to image every possible configuration and situation. This is what software development is like today!

With 5G, Blockchain, AR/VR, and AI just around the corner the business of web and mobile development will become an even more unconstrained hot mess.

As developers we have gone from being lone wolves working independently to two-pizza teams collaborating in agile to multiple feature teams networked globally. It’s all we can do to just keep up with advances in hardware, software, operating systems, communications, and regulations.

We, as an industry, have not recognized the costs of software eating the world.

The hidden costs and ignored costs of crafting software for consumption on the Internet has grown non-linearly and I don’t think it’s stopping anytime soon. These costs are disrupting almost everybody everywhere. It’s gotten so bad that the generally favorable view of the Internet has soured.

So there are a few things we need to do!

Hold honest conversations about the costs of open and connected Internet software. We can’t keep throwing apps out into the wild and expect them to be safe and reliable. Cloud computing has solved this problem for the backend services as we now accept that servers are not pets and because we aggregate the costs and leverage expertise of giants like Google, Amazon, and Microsoft.

While we have learned to treat servers like machines, we still treat apps like pets, grooming them and agonizing over their look and feel while ignore our duty to make sure these apps are not vectors for malware and malcontents.

We need new software development tools and services that give cloud-like benefits to the mobile side. I’d love to have a CodePen or a Glitch for native mobile apps that aggregate costs and expertise required for responsible mobile app development. Apple and Google should not only give developers open SDKs and dev kits but something like a Swift Playground that we can release enterprise and consumer apps on top of. Yes, I’m asking Apple and Google to be gatekeepers, but we’re too irresponsible to abandon gatekeepers.

Finally, I would like to see a return to physical books, newspapers, and encyclopedias (insert your favorite old-school media product here) for the public mainstream use case. Digital-only isn’t yet the best way to communicate and express ideas for the commonweal. A paper publisher can at the very least assure us media experiences without any Momo Challenges inserted between the pages. Even more importantly a paper publisher has a name, an address, a phone number, and a way for us to legally hold them responsible.

We shouldn’t put the digital genie back in the pre-Internet bottle. We should be extremely realistic about the costs and responsibilities of developing software applications that users must depend on and trust.

Book Binder Update

My comic book collection iOS app continues to evolve. I continue to strip out features and focus on the core mission: Buy a comic, snap a photo, add it to your collection.

With that in mind the UX now looks a lot like a photo app that has been preconfigured for storing comic book metadata. Here are the most recent screen shots from my iPhone XS Max:

Summary View

The Summary View displays a scrolling list of series. Each series displays the covers you have photographed. Right now I’m using placeholder covers–I don’t actually own the original Superman comics from the 1930s! This is all built with standard UIKit UIViewController and UICollectionView. I’ve added a custom UICollectionReusableView for the header of each section (series) and for the last cell of each collection I’m using a custom UICollectionViewCell.

I sort the comic book covers in each section by ID, which is a mashup of issue number and variant string. I sort these strings using localizedStandardCompare so that issue 2a comes before issue 20. I love localizedStandardCompare because I didn’t have do any work to solve the thorny “sort strings with numbers and letter as if they are numbers” problem.

The custom collection reusable view is mostly there to display the publisher, name, and era of a comic book series but also to host an edit button that brings up an EditSeriesPopoverView.

Popover views are cool but no longer supported as a presentation type by UIKit so you have to manually display them. I use a UIVisualEffectView to blur out the background behind the popover. I love it when I don’t have to write code!

Detail View

Each detail view displays a large image of the comic book cover photo and some metadata around it. The UISwitch sets the alpha of the cover image to 0.3 if false and 1.0 if true–this give you a nice visualization of what you still have in your collection and what you have sold.

The Edit button brings up the EditIssuePopoverView. I’ve figured out how to pass functions so that I can reuse popover views from different buttons: add an issue vs edit an issue. That’s very cool and has ramifications for how hardcoded view controller need to be to views.

Cutting Scope == More Value

This app looks nothing like my initial conception and has far less functionality than I thought I needed. I find this to be true with most apps I download. They do too much and don’t focus enough on their core use-case. Too much scope means the value of an app is diffused like the pixels behind a UIVisualEffectView.

As always you check out my code on GitHub!

Book Binder App Update: Variant Chaos!

I’m still working on that comic book collecting app! It’s starting to look and feel like a real app but still has a long journey of test driven development ahead of it!

Here’s what it looks like so far…

On the left is the Summary View and on the right is the Detail View. The summary view displays all the series and all the issues in each series that you are tracking. The detail view displays a particular issue, selected from the summary view. As I’ve complained, comic book publishers have little or no organizing skills and important identifying metadata is driven by marketing and whims. My app has to shoehorn a seriously fuzzy world of comic book print editions into data structures that can be sorted. This requires hard choices about how to organize comic book data so the user gets a usable app.

One of the biggest puzzles is what to do with variant covers!

Purchase a recent issue of any popular comic book, especially from Marvel, and you’re just as likely to get a variant cover edition as not. Marvel, as far a I can tell, doesn’t identify variant additions with a signifier. This is probably great marketing. There is some unknown number of variant covers for Fantastic Four #1 (volume 6, started this year 2018). I’ve counted 28 so far!

Hunting variants is like hunting Pokemon! You never know what is lurking in the next comic book shop. Some variant covers are created by famous artists, others feature famous moments or particular characters, some are specific to a comic book shop, and others are created for the dramatic effect of displaying all the covers side-by-side.

Faced with no official designation and a need to figure out what you may or may not own, comic book collectors and sellers have developed their own, local, ways of identifying variants, usually a single letter or string of letters, but also potentially any printable character.

After considering all this madness here is what my super sophisticated user experience for recording variant identifiers looks like…

There’s not much I can do to help the user other than provide some examples in the prompt and make sure she doesn’t add the same variant twice. I’ll probably find a way to list the current tracked variants, if any, and a way to add cover photo as well. But for now the simple text field does the job!

I think this is why general purpose productivity tools, that give people the work of organizing information in to a structure, usually seem easier to use than specialized apps like Book Binder. A simple spreads sheet could be built in minutes the job that my app is taking months to figure out.

I can’t easily use a spread sheet with my iPhone. I tried the iOS versions of Excel and Numbers and they are great for viewing but not so great for data entry or creation. Spread sheet are famous for being so open that errors in data and formulas are hard to detect. Squishy data like variant edition covers is easy to put in a spreadsheet but hard to test, hard to verify, and hard to maintain.

My long term hope is to use Apple’s MLCore Vision framework to identify variant editions on the user’s behalf. But that will only work if I work with a comic shop or collector who can tag all known variant editions and provide me with that data–or just make it available for free on a nicely scalable web server.

As always you can find the Book Binder Code on GitHub: https://github.com/jpavley/Book-Binder

Unit Tests Equal Awesome

I’m working on a hobby project iOS app that lets me track my comic book collection. I’m interested in comic books because all these super heroes from my misspent youth rule the world of popular culture. While the cool kids were playing sports and going to parties I stayed at home reading comic books. In college I stopped and found other things to do (computer programming, talking to humans, MTV). But now in the September of my life comic books are back and grip our imaginations tightly with their mutant powers.

I wanted to get back to the source. Where did all this cultural power come from? As I started buying physical comics again I realized I needed to track these objects of my affection on my phone. And I bet there are already dozens of apps that do this but I like to create my own tools.

Book Binder is the app and you’l find the code on GitHub.

Book Binder is an iOS app with a web backend. It’s an enormously long way from finished. I have lots of parts of it to figure out. The two current big problems are that comic book publishers can’t count and the number of comic books published is huge.

Comic book publishers can’t count!

Let’s take the case of Daredevil. One of my favorites as a teen and now a big show on Netflix. For reasons that are beyond comprehension (probably marketing) Marvel has restarted the numbering of the “man without fear” 6 times! Daredevil #1 was published in 1964, 1998, 2011, 2014, 2015, and 2017–and I don’t mean republished (that happens too). Daredevil #1 from 1964 is a completely different comic book from all the other Daredevil #1s in the five succeeding years! At one point Marvel tried to fix the problem with “legacy numbering” and that’s why the current series of DD started with #598 in 2017 instead of #29. I have no doubt in my mind that Marvel will start over with Daredevil issue #1 soon.

The other counting problem created by comic book marketing is variant issues with different covers. The most recent issues of Doctor Strange may or may not be published with different covers for the same issue. Collectors apply letters to each variant but Marvel doesn’t seem to have official variant designations. I have Doctor Strange #2 variant edition, legacy #392, second printing. I’m not sure how many variant editions were published or what the variant letter for each edition should be.

This counting (really identifying) problem makes it hard to come up with a good data structure for storing a comic book collection. I’m using a combination of a URI (unique resource identifier) and JSON (JavaScript Object Notation. This way I can easily share data between the iOS app and web server and with other comic book collectors, sellers, and buyers.

The number of comic books published is huge!

How many issues of Daredevil or Doctor Strange have been published since the 1960s? It’s hard to say. I estimate between 400 and 500 for Doctor Strange but I’m probably not including annuals, specials, team ups, side series, and all the variants. So let’s double that to 800 to 1000. And that’s the “master of the mystical arts” alone. If Marvel has around 200 books and DC has the same then we’re looking at a lower bound of 320K and an upper bound of 400K just for the two majors. Some of DC and Marvels comic books started in the 1930s and 1940s. If we include those and all the indy publishers (like Dark Horse) and all the publishers who have disappeared (like EC) then I’m going to estimate 1.6 million to 2 million unique comic books published in the USA. It’s really hard to say because it’s hard to know where to draw the line with publishers and if certain reprints should be included.

In any case I’m not going to be able to store more than a fraction of the millions of published comic book metadata representation in a phone. At best I can store a slice of this data locally and using any one of the big clouds to keep a shared catalog. I just want all this info to be quick to access, cheap to store, and easy to reconcile.

Testing an app for that

Let me tell you, creating an app, on my own, as a hobby project, is fun but hard. Like climbing a rock wall (which I would never personally do) you make a lot of false starts and have to retrace your steps trying to find a path forward.

This is where my unit test have helped. No, not helped. Made everything possible!

I started with three or four data structures. I’m testing out ideas and changing my mind as the idea do or don’t pan out. I’m not afraid to make large scale changes to my code because every function of every class has unit tests to make sure that if I break anything I can fix it.

Today I realized I had to take a big step back. I could not instantiate a comic book collection from a list of comic book URIs. I also realized I was storing state info in the comic book URIs which would not scale with millions of books to track. I finally realized that I had to enforce consistency in the formation of my comic book URIs (they all have to have four slashes). This way I could tell if a URI was mangled or incomplete.

I had to touch every one of my six major object that support my app… And I did! With Confidence. Once I removed state from my URIs and got all by unit tests to pass I fired up the app–and it worked fined. I had not added any bugs or broke any functionality. Whew!

If I didn’t have unit tests I’d be afraid to touch the code. I would be much more respectful of the code and I once I got some part of it to work I’d leave that part alone. As this is a lonely hobby project, I’d get stuck, give up, and move on to something easier.

Even with commercial software, with large teams of expert programmers, lack of tests and fear of changing the code, results in most software projects falling behind, abandoned, or just buggy.

I was sold on unit tests and Test Driven Development before and I’m resold every day I write code. I don’t care if you write the tests before or after the code that makes them pass (I do a bit of both). Just write the tests–especially if you are writing code for self-driving cars or robot military machines.

iPhone to the Max

I’m on that Apple program that where you pay for an iPhone over time and you get the opportunity to update immediately when a new model comes along, as it does every year.

While this is a very good deal for Apple, almost like a subscription service, it’s a good deal for me too. I hate the feeling of FOMO that comes along when a new computer, phone, or device is released. But with iPhones (and Android phones) it’s more than just a feeling. Missing out on the latest phone means missing out on important new features, security protections, and performance improvements.

FOMO used to be a big problem for personal computers as there were big performance jumps between PC models back when Moore’s Law was still in full effect. These days you can still get great results from a 5 year old PC or MacBook. Maybe you can’t play VR games but you email and browse the web like a champ.

Smart Phones are in a different place on the product evolution curve than PCs. They still have a long way to go before they settle down. Innovation in smart phones is driven by advances in displays, cameras, custom chips, and machine learning. Even incremental improvements in these technologies means far better user experiences, security, and even more epic cat photos.

I’m super happy with the jump between the iPhone 8+ and the XS Max. It responds faster, is easier on the eyes with its 6.5″ screen, and yet is basically the same size as the 8+. The name is kinda of silly. But I don’t care what Apple names their phones.

At some point phone hardware will cease to evolve and some new device will become our “primary interface” to the Internet. My guess is that it will be a watch of some sort with accessory glasses or screens. But these things are hard to predict.

I’m on the train as I type this post into my phone and one guy is still reading a paper news paper. I guess that iPhone XS Max just didn’t excite him.