About APIs and SDKs

API vs. SDK

Early on in the life of Pathomation, it became clear that in order to tackle the variety of use cases out there for digital pathology, we needed to build of piece of veritable digital chameleon software.

Luckily there is a way to do exactly that in engineering, and that is through the establishment of an Application Programming Interface (or API for short). It all starts with an API.

Hugo Bowne-Anderson of Datacamp explains what that means: An API is a set of protocols and routines for building and interacting with software applications.

Why are APIs important? An API is a bunch of code that allows two software programs to communicate with each other. You can connect to an API, pull data from them, and subsequently parse that data.

Using APIs has become the standard way to interact with applications ranging from Wikipedia to Twitter. PMA.core (and its little brother PMA.start) has an API, and our own product suite makes heavy use of it.

So while PMA.core isn’t our main product, it is definitely where everything starts, and today I see how many of our success stories can be attributed to the API that PMA.core offers.

Version 2

Currently we’re in the works of wrapping up version 2 of PMA.core. As it should be, we learned a lot in the last couple of years about how (not) to use our own interfaces.

Some good things:

Under the motto “eat your own dogfood”, we’ve successfully employed the API to our own benefit in separate projects that were built on top of PMA.core. The Willy Gepts collection exploits not only our slide visualization interface, but also our metadata engine. Pathotrainer is a great example of an end-user product built on top of PMA.core in the pharmaceutical space.
With a limited number of calls (about two dozen), we’ve been able to facilitate a very broad span of downstream consumers. Examples include Blackboard at the University of Antwerp, a completely custom HTML website for med school students in Brussels, and recently a completely new type of courseware for clinical (research) stakeholders.

Some not so good things:

We limited ourselves in terms of granularity. Here’s an example: PMA.core offers the possibility to store slide meta-data in an audit-trailed, 21CFR part 11-compliant manner. However, the present interface only allows to get data for one slide, one form at the time. So when you work on courseware projects like PathoTrainer, you need to put in additional progress bars, while the underlying data is retrieved in an atomic fashion. In contrast, of course, you’d much rather be able to retrieve the n metadata sets for m slides in a single call. In version 2 we will be able to do just that, and more.
We need much more documentation and more sample code. Pointing people to your WSDL manifest is NOT sufficient.

SDK

We’ve always had the idea to bring out a complete Software Development Kit (or SDK for short). The idea for an SDK is simple enough: take everything your API can do and build content every call and every combination of parameters imaginable.

But what exactly do you put in it? What languages do you support? Looking around our own environment, we chose Microsoft.Net, Java and PHP as prime candidates. We know there are more of course, but you have to pick your battles.

We thought we were doing pretty well, having Java and Microsoft.Net desktop application sample code, until one partner told us they were using Delphi. What’s next? Windev https://fr.wikipedia.org/wiki/WinDev?

Two problems then exist with the SDK the way we currently have it:

When we make a matrix with documented features and supported languages, we end up with a rather sparse matrix. This means that some things are documented in one language, but not in another. There are a couple of essential tasks that we have in any environment of course, like establishing a connection, but it’s inconsistent at best. We thought actually that we could do this demand-driven, meaning that when someone asks us how to do task t in language l, we just fill up cell (t, l) in the foresaid matrix. This works, to some extend; you can respond to the client, and be confident that the code you contribute is actually something application develops want. But the end-result is messy and doesn’t look very professional.
Our code examples that we’ve been adding were mostly wrappers around API calls. In Microsoft.Net, we’d a WebRequest call and interpret the returned JSON or XML stream. In PHP, we’d do a file_get_contents of whatever API call we needed to get the job done, and again interpret the results. This got the job done, but as a result much of the code that we delivered was more a tutorial in how to read webcontent and interpret the returned structured data. Ideally however, these should focus more on what can actually be done with the software (instead of how to do something).

Wrapper libraries

For the current version 2 then, I want to be more up-front with our SDK offering. I want to be better prepared. I don’t think we should try to convince people anymore to “just” go and use our API. It’s too abstract, too inconsistent even at places (sometimes for historical reason; who thought you could grow legacy so quickly?), and frankly too steep a learning curve probably for a great many people.

Pathomation offers a platform for the development of digital pathology software. I want to make sure people like to stand on our platform to begin with. It’s not necessarily what I want to do w/ it… it’s what others would want to with it.

We have already have code in python that fetches images from PMA.core and does “something” with them. But there’s nothing fancy about your sample Python script; you DL a /region URL, then you process it like any other image. This is at API level.

A wrapper library can now add more functionality; repetitive basic tasks. What do developers not like to do? Write plumbing code. We can encapsulate all of that under the hood of a PyPI package or Java namespace. We’re already doing that for the front-end handling and representation of whole slide imaging content with our Javascript PMA.UI framework.

Here’s another idea: I can’t do a nuclear cell count on an (reduced resolution) overview image; I need at least 20x resolution for that. But I can do a cell count on an individual tile at high resolution, and then put on a dot on the overview image to at least indicate where the cell was found. What I would want to do instead is create an overview image of e.g. 1000 x 2000 pixels, then loop through x * y tiles at a zoomlevel that in reality represents 5000 x 10000 pixels (but which is too bulky to process in one time via /regio; we’re not VIPS); process the individual tiles, scale and imprint the result from each tile back onto the 1000 x 2000 pixel overview image.

I can imagine a class representing a slide that has indexer logic that retrieves tiles in real time (sort of like a programmer’s server-side version of PMA.UI), but then destroys them again once the object is not needed anymore (so the memory usage doesn’t explode).

from pathomation import pma

dir = pma.get_first_non_empty_directory()
slide = pma.get_slides(dir)[0]

max_zoomlevel = pma.get_max_zoomlevel(slide)
print ("Max. Zoomlevel: " + str(max_zoomlevel))
print ("Size in pixels: " + str(pma.get_pixel_dimensions(slide)))
print ("Resolution (PPM): " + str(pma.get_pixels_per_micrometer(slide)))
print ("Physical resolution (µm x µm): " + str(pma.get_physical_dimensions(slide)))
print ("Number of channels: " + str(pma.get_number_of_channels(slide)))
print ("Slide is fluorescent? " + str(pma.is_fluorescent(slide)))
print ("Number of tiles: " + str(pma.get_number_of_tiles(slide)))

selectedZl = 1 # do the following on zoomlevel 1 for demo purposes
tileSz = pma.get_number_of_tiles(slide, selectedZl) # zoomlevel 1
for tile in pma.get_tiles(slide, toX = tileSz[0], toY = tileSz[1], zoomlevel = selectedZl):
     tile.show()
     # do something with the tile

In closing

There’s a tremendous problem today with scientific software: published methods are described at a very high level; and not at all easy to replicate. And when they are detailed enough, or source code is made available, it’s not in an accessible language like Python (or Java for that matter; or it has sooooo many dependencies…). Too many research papers can be concluded with “now good luck finding the one former postdoc that actually knew how to do this…”

Remember “Developers developers developers”? Yes, make fun of Steve Balmer all you want, but he got it right. You offer people basic services, and then get those people to develop software on top of your infrastructure.

So how do we intend on addressing this? By continuing to make our own software as versatile and kick-ass as possible (duh 😊), but also by going just one step further and reaching out to the legion of developers and researchers out there that currently still just have to make do with what they can get their hands on. We claim to have a better mousetrap for you, and we’ll prove it to you.

March 20, 2018September 24, 2018

How to test and benchmark a tile server?

As mentioned before, I’m the Chief Technology Officer of Pathomation. Pathomation offers a platform of software components for digital pathology. We have a YouTube video that explains the whole thing.

You can try a local desktop-bound (some say “chained”) of our software at http://free.pathomation.com. People tell us our performance is pretty good, which is always nice to hear. The problem is: can we objectively “prove” that we’re fast, too?

The core components of our component suite is aptly called “PMA.core” (we’re developers, not creative namegivers obviously). Conceptually, PMA.core a slide tile server. Simply put, a tile server serves up data in regularised square shaped portions called tiles. In the case of PMA.core, tiles are extracted in real time on an as-needed basis from selected whole slide images.

So how do you then test tile extraction performance?

At present, I can see three different ways:

On a systematic basis, going through all hypothetical tiles one by one, averaging the time it takes to render each one.
On a random basis
Based on a historical trail of already heavily viewed images.

Each of these methods have their pros and cons, and it depends on what kind of property of the tile server you want to test in the first place.

Systematic testing

The pseudo-code for this one is straightforward:

For x in (0..max_number_of_horizontal_tiles):
  For y in (0..max_number_of_vertical_tiles)
    Extract tile at position (x, y)

However, we’re talking about whole slide image files here, which have more than just horizontal and vertical dimensions. Images are organized as a hierarchical, pyramid-structured stack, and can also contain z-levels, fluorescent layers, or even timelapse data. So the complete loop for systematic testing goes more like this:

For t in (0..max_timeframes):
  For z in (0..max_z_stacks):
    For l in (0..max_zoomlevels):
      For c in (0..max_channels):
        For x in (0..max_number_of_horizontal_tiles):
          For y in (0..max_number_of_vertical_tiles):
            Extract tile at timeframe t, z-stack z, zoomlevel l, channel c, position (x, y)

But that’s just nested looping; nothing fancy about this, really. We’ve been using this method of testing for as long as we can remember pretty much, and even wrapped our own internal tool around this, (again very aptly) called the profiler.

What’s good about this systematic tile test extraction method?

Easy to understand
Complete coverage; gives an accurate impression of what effort is needed to re construct the entire slide
Comparison between file formats (as long as they have similar zoomlevels, z-stacks, channels etc.) allow for benchmarking

What’s bad about this extraction method?

It’s unrealistic. Users never navigate through a slide tile by tile.
Considering the ratio of the data being extracted from different dimensions that can occur in a slide, you end up over-sampling some dimensions, while under-sampling others. Again this results in a number that, while accurate, is purely hypothetical, and doesn’t do a good job at illustrating the end-user’s experience.
In reality, end-users are only presented with a small percentage of the complete “universe” of tiles present in a slide. Ironically, the least interesting tiles will take the smallest amount of effort to send back (especially in terms of bandwidth, like “blank” tiles containing mostly whitespace on a slide or lumens within a specimen etc.)

Random testing

In random testing, we extract a pre-determined (either fixed number or percentage of total number of total available tiles). The pseudo-code is as follows:

Let n = predetermined number of (random) tiles that we want to extract
For i in (0.. n):
  Let t = random (0..max_timeframes)
  Let z = random (0..max_z_stacks)
  Let l = random (0..max_zoomlevels)
  Let c = random (0..max_channels)
  Let x = random (0..max_number_of_horizontal_tiles)
  Let y = random (0..max_number_of_vertical_tiles)
  Extract tile at timeframe t, z-stack z, zoomlevel l, channel c, position (x, y)

The same statistics can be reported back as with systematic testing, in addition to some coverage parameters (based what percentage of total tiles were retrieved).

Let’s look at some of the pros and cons of this one.

Here are the pros:

Faster than systematic sampling (see also the “one in ten rule” commonly used in statistics: https://en.wikipedia.org/wiki/One_in_ten_rule)
For deeper zoomlevels that have sufficient data, a more homogenous sampling can be performed (whereas systematic sampling can oversample the deeper zoomlevels, as each deeper zoomlevel contains 4 times more tiles).
Certain features in the underlying file format (such as storing neighboring tiles close together) that may unjustly boost the results in systematic sampling are less likely to affect results here.

What about the cons?

Smaller coverage may require bootstrapping to get satisfying aggregate results.
Random sampling is still unrealistic. Neighboring tiles have less chance of being selected in sequence, while in reality of course any field of view presented to an end-user is the result of compound neighboring tiles
Less reliable to compare one file format to the next, as this may again require bootstrapping.

Historic re-sampling

A third method can be devised based on historic trace information for one particular file. A file that’s included in a teaching collection and that’s been online for a while, has been viewed by hundreds or even thousands of users. We found in some of our longer running projects (like at http://histology.vub.ac.be or http://pathology.vub.ac.be) that students under such conditions typically are presented with the same tiles over and over again. This means that for a given slide that is fairly often explored, we can reconstruct the order in which the tiles for that particular slide are being served to the end-user, and that trace can be replicated in a testing scenario.

In terms of replication, this is then the most accurate way of testing. Apart from that, other advantages exist:

This is the best way to measure performance differences across different types of storage media. If for some reason a particular storage medium introduces a performance penalty because of its properties, this is the only reliable way to determine whether that penalty actually matters for whole slide image viewing.
For large enough numbers (entries in the historical tracing logs), a “natural” mixture of different tiles in different zoomlevels, channels, and z-stacks will be present. This sequence of tiles presented in the trace history automatically reflects how real users navigate a slide.

However, this method, too, has its flaws:

This type of testing and measuring cannot be used until a slide has actually been online for a certain time period and browsed by a large number of end-users.
Test results may be affected by the type of user that navigates the slide: we shouldn’t compare historical information about a slide browsed by seasoned pathologists with how novice med school students navigate a different slide. Apples and oranges, you know.
Because each slide has its own trace, it become really hard to compare performance between different file formats.
Setting up this type of test requires, of course, historical trace information. This means that this test is the most time consuming to set up: IIS logfiles have to be parsed, tile requests have to be singled out, matched to the right whole slide image etc.

Preliminary conclusions

This section came out of discussing the various strategies with Angelos Pappas, one of our software engineers.

The current profiler that we use was built to do the following:

Compare the performance impact of code modifications in PMA.core. For example by changing around a parser class, or by modifying the flow in the core rendering system etc… We needed a way to relatively compare what’s the difference between versions.
Compare the performance when rendering different slide formats. To do this, you need similar slides (dimensions, encoding method and of course pixel contents), stored in different formats. The “CMU-{N}” slides from OpenSlide are a good case, as well as the ones we bring back ourselves from various digital pathology events. This again, allows us to do relative comparisons that will give us hints about why a format is slower than another. Is it our parser that needs improvement? Is it the nature of the format? etc.
Compare the performance of different storage sources, like local storage versus SMB.

The profiler does all of the above nicely and it’s the only way we have to do such measurements. And even though the profiler supports a “random” mode, we hardly ever use it. Pathomation test engineers usually let the profiler run up to a specific percentage or for a specific period and compare the results.

Eventually what you want to accomplish with all this is to get an objective measurements for user experiences. The profiler wasn’t really meant to measure how good the user experience will be. This is a much more complicated matter, as it involves patterns that are very hard to emulate, network issues, etc etc. For example, if a user zooms into a region, the browser fires simultaneous requests for neighboring tiles. If you ever want to do this kind of measurements, perhaps your best bet would be to do this by commanding a browser. Again though, your measurements would give you a relative comparison.