Fingerprinting applications

A naïve way to detect duplicate data

We often copy data for a variety of reasons. While testing a new program, we can copy data any number of times so the software is able to work on a dataset instead of a single datapoint. Temporary data unfortunately is all too often forgotten about, lingers around, and unnecessarily clutters our hard disks.

There are a number of ways to solve this problem. With Python, we can create a script that retrieves all slides, inspects the file size of each slide, and reports which two slides have the same size.

Using our PMA.python SDK, the code looks like this:

from pma_python import core
all_slides = core.get_slides("C:/", recursive = True)
def get_slide_size(slide):
    info = core.get_slide_info(slide)
    return info["PhysicalSize"]

all_sizes = {}
for slide in all_slides:
    fp = get_slide_size(slide)
    if "d" in fp.keys():
        fp = fp["d"]
        # print (slide, fp)
        if not fp in all_sizes.keys():
            all_sizes[fp] = []
for (k, v) in all_sizes.items():
    if len(v) > 1:
        fn = v[0]
        if not (".png" in fn.lower() or ".jpg" in fn.lower()):
            print(k, len(v))

But wait, what if two slides are the same size? Whole slide images are typically 100s of megabytes in size. Based on this characteristic, you could assume that it’s unlikely two slides result identical file sizes. But macroscopic images are important in pathology, too, and then we’re talking about file sizes that are only a couple of megabytes in size at most. Now think of a scenario where tens of 1000s of cases are treated annually, with multiple macroscopic photos being taken of each resection piece… Suddenly the chances of two files having the same size becomes plausible.

There’s a second, albeit somewhat more hypothetical reason, why slide size is a poor indicator here. Wsi data could be stored in a container format, and the container format can have certain limitations. We observed e.g. that our PMA.start installation package has now not changed in size for the last 7 releases or so. But of course our code did change. So, empirically, file size is not a good discriminant for executable files. We feel therefore that we cannot assume that this would be the case for image file formats. Since re-scans are a specific concern with microscopy and WSI data, something better is needed than just the filesize.

Introducing fingerprinting

We can think of a way to unambiguously distinguish slides from one another by combining a number of characteristics into a digital fingerprint. These would include:

  1. Filesize (we didn’t say this was a bad one; just an insufficient one)
  2. Pixel size
  3. Pixels per micron
  4. Number of channels
  5. Number of z-stack layers

If we had infinite real-time computing power, we can think of more:

For practically, we define a slide’s fingerprint in the Pathomation platform as a combined hash of physical file size, as well as most of the parameters returned through the GetImageInfo method.

We also consider this fingerprint method to be essential, and so for stability, it is incorporated at the level of PMA.core (PMA.start) itself rather than at SDK level, so it can be transferred across programming boundaries. A fingerprint for slide [foo] requested through yields the same results as when requested through PMA.php or PMA.python.

Slide integrity

The fingerprint method is a good way to confirm the integrity of a slide itself. When a file is not what it pretends to be, the fingerprint cannot be calculated, and an error follows.

Note that the above would not be possible if we stuck to conventional CRC-like checks, since those don’t take into account the nature of a slide. Of course, you can do a CRC check on any file regardless of whether it actually is a slide or not.


We recently introduced PMA.transfer. Have you ever been frustrated by people sending you individual VSI or MRXS slides without anything else? Did you ever felt uneasy about just having transferred a gazillion number of bytes half across the world, without any reassurance of whether it actually worked? Then you should definitely have a look at PMA.transfer. It’s like FileZilla, but for slides. SlideZilla.

PMA.transfer uses fingerprinting to ensure data integrity in between transfers. Whether you’re moving slides from and to PMA.start, PMA.core, or My Pathomation, the same fingerprint calculation algorithm is used to compute a slide’s unique signature. This means that PMA.transfer can obtain the fingerprint of a source and a target instance and simply compare one with another to see that they’re identical.

Another application is found in our upcoming product. Of course, the actual fingerprint of a slide is show in the slide info panel. But a string like “” is not saying a whole lot and is for purely informative purposes at best. can also be used to create annotations on slides. When you make an annotation, it is stored in our back end with a reference to the original slide, as well as the slide’s fingerprint.

This serves two purposes:

  • After moving a slide to a new folder path or even physical location, you can still retrieve its annotations.
  • When you have two identical copies of slides, each annotated separately, you can use fingerprinting to combine the annotations from both in a single view.

The possibility to combine annotations from identical slides stored in different locations in a single view offers opportunities for blinded studies and validation exercises. Inter- and even intra-observer variability can be measured this way, too.

Retrieving annotations by fingerprint is not available by default; you need to invoke this with an explicit button in the ribbon. It’s a performance thing.

Last but not least, fingerprinted annotations can be used to keep track of annotations during migration processes. As your applications for digital pathology increase, you will occasionally restructure your folder structures, or perhaps move to an entire new storage device altogether.

Finding duplicates

Back to our original question: imagine that you’ve been managing a whole slide repository for a while, and as careful as you’ve been, you suspect that you now have ended up with a number of copies of a variety of slides in different locations. You know: you copy a slide to test something, pinky-promise yourself that you’ll remove the slide again afterwards, that for real you’re really not going to forget this time… and then… you forget about it.

Thanks to the fingerprinting method and a few lines of Python, it is easy to trace duplicates however.

Here’s the basic code to build a dictionary that has all the possible fingerprints as keys. Each entry then contains a list that specifies where the exact copies that share a particular fingerprint:

slides_by_fingerprint = {}
for slide in slides:
    fp = core.get_fingerprint(slide)
    if not fp in slides_by_fingerprint:
        slides_by_fingerprint[fp] = [slide]

When a slide is unique, then the list of the dictionary will only have one entry. Alternatively, it can have two or more entries. So the duplicated slides are detected and flagged as follows:

for fprint in slides_by_fingerprint:
    if len(slides_by_fingerprint[fprint]) > 1:
        print(slides_by_fingerprint[fprint][0], "is copied", len(slides_by_fingerprint[fprint]), "times")

If you want, you can further automate the pruning and the deletion of these duplicates. Sometimes it’s easy; sometimes it’s not. You need to make sure that you have the original copy in its intended place. And in some case, you may actually want to keep at least a second copy of a slide around, as one may be transient in a clinical setting, whereas its copy may have just been added to a reference repository to teach students and staff.

Coming full circle

Fingerprinting serves a triple function:

  • Detect whether a dataset is a real slide or not
  • Guard data integrity when transferring from one medium to another
  • Trace slides and associated content through a complex storage hierarchy

Fingerprinting applies the concept of hash-functions to slides. Like everything in the Pathomation platform, the slide itself is the key unit to interact with. There is only one fingerprint for a slide, whether is consists of a single, or multiple files. Consequently, you can only obtain a fingerprint for a slide. If the file is somehow corrupt or the file format isn’t recognized by PMA.core, you’re not going to get a fingerprint from it. Last but not least, fingerprints are invariant across storage media and instances of PMA.core (PMA.start), making it a useful feature for slide tracking.

Random sampling and ground truth annotations

Challenges and opportunities

It’s been well established that whole slide images are big. We wrote a tutorial on this ourselves.

This poses challenges for both computers and analysts alike:

Consider the pathologist that must identify x number of cells of a certain classification. How many should he aim for? How big should his field of view be to select from?…

Automation seems to be a solution, but here too limits crop up. Professional image analysis software is expensive, people need to be trained, and there are only so many pixels any GPU can process in any given time.

The solution comes in treating the image analysis process as a multi-phased project.

In the first phase, select fields of view and regions of interest can be prepared as annotations on top of a whole slide image. This pre-selection can be as simple as an automated algo that identifies entropy in a pixel environment, or a pathologist that carefully picks and curates regions of interest.

In other words: statistical (random) sampling is the name of the game. And our very own is a great solution to make these ground-truth annotations in.

Annotations in PMA.core

Whether via scripting or manual curation, annotations end up stored in PMA.core.

Internally, we store the annotations as Well-Known Text (WKT) strings, but they can be converted to several other file formats, too, including Excel CSV, Visiopharm MLD, Leica/Aperio XML, or Halo Annotation XML.

We provide several other resources regarding annotations that can provide more background:

When your annotations are part of a random sampling exercise, chances are that you’re going to want to do more downstream operations with them.

In this article we will therefore:

  • Use Jupyter and pma_python to interact with PMA.core
  • Identify geometric (polygon) annotations and examine their properties
  • Convert annotations to rectangular snapshots at high-resolution
  • Save these extracted annotations as new separate high-resolution tiled TIFF slides


The Core module of our SDK contains a get_annotations() function already. Let’s start by examining what we get back when we invoke it on our sample slide:

from pma_python import core
core.connect("https://srv/pma.core/", "usr", "***")
slideref = "/rootdir/slide.mrxs"
annotations = core::get_annotations(slideref) 

Now we can print the first element and see what it contains. We use the pprint library to make our output look pretty:

We can immediately see the audit trail, and beyond that the most obvious element is the Geometry. As you be deduced: the geometry defines all points that make up an annotation. In our case our polygon is merely a rectangle, so we find 5 (x, y) coordinates, with the fifth one being the same as the origin. The format can be generalized and written out in a symbolic annotation that looks like this:

POLYGON((x1 y1,x2 y2, x3 y3, x4 y4,…, xn yn,…, x1 y1))

If we want to convert these annotations to snapshots, we need to determine the x y coordinates of the points that define a rectangle that contains all points of our original polygon.

In other words, for each of the x y pairs of coordinates given, we find the minimum and maximum x and y values. We can then use these to compute the width and height of the resulting (high resolution) snapshot.

Luckily this is easier to do than finding the largest rectangle within a polygon!

def annotation_to_rect(ann):
    points = ann.split(",")
    min_x = sys.maxsize
    max_x = sys.maxsize * -1
    min_y = sys.maxsize
    max_y = sys.maxsize * -1
    for point in points:
        (x, y) = point.split(" ")
        x = float(x)
        y = float(y)
        if x > max_x:
            max_x = x
        if x < min_x:
            min_x = x
        if y > max_y:
            max_y = y
        if y < min_y:
            min_y = y
    w = max_x - min_x
    h = max_y - min_y
    return (min_x, min_y, max_x, max_y, w, h)

And we can use this method to get the coordinates of the first annotation.


In earlier tutorials, we mostly stuck with extracting tiles from PMA.core. But if you want to extract arbitrary regions, you can use core::get_region() instead. The call uses the same coordinate system as used to store annotations.

Our next step then is to use these coordinates and parameters as arguments for the get_region() call.

region = core.get_region(slideref, min_x, min_y, w, h)

Without any additional parameters, get_region() automatically retrieves pixels at the deepest zoomlevel. While this is what you want, it is quite possible that your environment may be protected against such (perceived) over-zealous behavior and responds with an error:

DOS attacks are a reasonable concern of course.

The solution then is to split up the coordinates in 4 quadrants. Like this:

region11 = core.get_region("slide.mrxs", min_x, min_y, w / 2, h / 2)
region12 = core.get_region("slide.mrxs", min_x + w/2, min_y, w / 2, h / 2)
region21 = core.get_region("slide.mrxs", min_x, min_y + h/2, w / 2, h / 2)
region22 = core.get_region("slide.mrxs", min_x + w/2, min_y + h/2, w / 2, h / 2)

Once the four quadrants are loaded, a new PIL image can be constructed, and the 4 quadrants can be pasted into the respective corners.

region_combo ='RGB', (int(math.ceil(w)), int(math.ceil(h))))
region_combo.paste(region12, (int(math.floor(w/2)), 0))
region_combo.paste(region21, (0, int(math.floor(h/2))))
region_combo.paste(region22, (int(math.floor(w/2)), int(math.floor(h/2))))"region.jpg", "JPEG", quality = 95, optimize = True, progressive = True)

By working with quadrants, you’re effectively creating a de facto 2 x 2 grid. If this still doesn’t work for you, you can create 3 x 3 grids, or go even more refined.

Pyramidal TIFF

What’s missing? Say that your resulting extracted high-resolution snapshot is 8K x 5K pixels in size. You can work with that kind of image in some programs, but it’s not ideal. And your resulting snapshot can be even larger than that.

The solution is to not save your PIL image in a JPEG format. Instead, to save it as a pyramidal (tiled) TIFF. Some environments, like ASAP, even require this kind of input format.

After installing the gdal library, you can use the following method to convert any PIL image object into a pyramidal (tiled) TIFF:

def PILToTiff(pilref, output_file= "pil.tif", target_quality = 80, downscale_factor = 1):
    tileSize = 512    
    tiff_drv = gdal.GetDriverByName("GTiff")
    output_filename =  output_file
    (w, h) = pilref.size
    ds = tiff_drv.Create(
        output_filename,  w,  h,  3,
            'COMPRESS=JPEG', 'TILED=YES', 'BLOCKXSIZE=' + str(tileSize), 'BLOCKYSIZE=' + str(tileSize),

    tilesX = int(math.ceil(w / 512))
    tilesY = int(math.ceil(h / 512))
    totalTiles = tilesX * tilesY
    pbar = tqdm(total=totalTiles)
    for x in range(tilesX):
        for y in range(tilesY):

            x1 = x * 512
            y1 = y * 512
            x2 = min((x+1)*512, w)
            y2 = min((y+1)*512, h)
            tile = pilref.crop((x1, y1, x2, y2))
            arr = np.array(tile, np.uint8)

            # calculate startx starty pixel coordinates based on tile indexes (x,y)
            sx = x * tileSize
            sy = y * tileSize

            ds.GetRasterBand(1).WriteArray(arr[..., 0], sx, sy)
            ds.GetRasterBand(2).WriteArray(arr[..., 1], sx, sy)
            ds.GetRasterBand(3).WriteArray(arr[..., 2], sx, sy)

    ds.BuildOverviews('average', [pow(2, l) for l in range(1, 5)])
    ds = None
    print("Done; see result in ", output_filename)

When we now systematically want to convert all annotations from a set of slides into separate high-resolution pyramidal TIFF files, it’s just a matter of putting together the functions we’ve developed in this tutorial:

for slide in core.get_slides("/root_dir/path/…"):
    annotations = core.get_annotations(slide)
    ann_idx = 0
    for annotation in annotations:
        ann_img = AnnotationToPIL(slide, ann_idx)
        tif_file = "c:/output/" + os.path.basename(slide).replace(".", "_") + "_" + str(ann_idx) + ".tif"
        PILToTiff(ann_img, tif_file)
        ann_idx = ann_idx + 1

The result can be seen in PMA.start in the c:\output folder afterwards:

Ground truth

Image Analysis (IA) comes in many shapes: Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL)… What they all need: curated data to train on. Sometimes it’s as simple as feeding them all the tiles contained in a whole slide image one by one, and we’ve done a couple of examples of this already on our blog.

At times however, supervised machine learning is the name of the game. For that, a pathologist may pre-select particularly interesting looking areas of interest. In other cases, statistical random sampling may help to make an existing algorithm more robust or fine-tune it.

The resulting regions of interest (whether manually annotated through or automated via an environment like OpenCV) can in turn be exported again in individual high-resolution images that represent true subsets of slides.

In this article, we showed how such a complete workflow can be facilitated by our very own PMA.python SDK. The resulting dataset is at the same high-resolution as the original, and the performance of the images is just as good as what you started with.

Of course you may not be using Python, or you may just be looking for something just a bit different. Need help? Do drop us a note. We love hearing about your use case, and think about how we can help solve problems.

Customizing the next generation of slide viewer software

It’s coming, it’s coming, it’s coming, it’s coming…

And we’re really excited about it!

We’re in the process of wrapping up This is going to be our flagship product for the next three years. It’s got everything a microscopy enthusiast needs, ranging from powerful viewport manipulation options, over annotations, to live conferencing. Remember our earlier article about how PMA.core handles (external) slide meta-data? All of that and more is in there, too.

In this blog post, we want to give developers as well as customers in our OEM and reseller program a sneak peek at and specifically focus on the opportunity for custom add-on development and white labeling. is currently in testing phase for various use cases already, ranging from comparative validation studies to enterprise-wide deployment as a central histological information cockpit. If you are interested in joining the effort and helping us track down last minute bugs, do shoot us an email and we can see if you’re a good fit for our beta-program.

You can find more information about itself in the landing page that we’re building for the software at

Administrative console has a separate interface apart for administrative tasks. admin console

In the administrative console, you can register any number of PMA.core instances to be used by to retrieve slides from. Once in use, for each registered PMA.core instance, you can see what users have sought access to it.

There’s a general dialog where you can set system-wide settings, like the address to contact in case of trouble, or the company / institute logo to use. This is where white labeling starts. global settings

Like for PMA.view 1.x, the administrative console in can be used to customize the ribbon interface at top of the interface. The syntax is XML. We don’t have any formal definition of the format, and typically work with customers to get the result they want.

You can play around with the XML code yourself if you want. The default toolbar provides enough example code to allow simple re-arrangements of buttons and move features between different tabs. ribbon editor

If you run into trouble, there’s always the option to restore the code back to a default toolbar.

Custom annotations supports annotations. We have a separate tutorial on the subject as part of our beta-program.

There are situations where you don’t want users to go off and make their own annotations at will.

You may want your pathologists to annotate invasive tumor margins or necrotic areas for a study protocol. It’s useful then that people align, and everybody uses the same color scheme. So in addition to providing standard annotation tools, it’s possible to define buttons on your toolbars w/ pre-set attributes and parameters.

Here’s a great example of a project we participated in recently, for which participants had to indicate six different types of tissues / cells: custom ribbon

The underlaying XML for this extra ribbon tab is as follows:

<Tab label="NKI" hint="custom ribbon tab" name="nki_annotations" enabled="true" visible="true">
    <Ribbon label="Total TLS" width="30%">
        <Tool type="buttons" size="large">
            <Command name="preset-peri_mat" icon="an_m_peri_tls.png" hint="Total # mature TLS (peritumoral)" label="Peritumoral, mature"
                pma-classification="peri mature"
            <Command name="preset-intra_mat" icon="an_m_intra_tls.png" hint="Total # mature TLS (intratumoral)" label="Intratumoral, mature"
                pma-classification="Intra mature"
           <Command name="preset-peri_immat" icon="an_im_peri_tls.png" hint="Total # immature TLS (peritumoral)" label="Peritumoral, immature"
                pma-classification="Peri immature"
            <Command name="preset-intra_immat" icon="an_im_intra_tls.png" hint="Total # immature TLS (intratumoral)" label="Intratumoral, immature"
                pma-classification="Intra immature"
        <Ribbon label="Pre-sets" width="30%">
            <Tool type="buttons" size="large">
                <Command name="preset-necrosis" icon="an_necrosis.png" hint="Tumor" label="% necrosis"
                <Command name="preset-fibrosis" icon="an_macro.png" hint="Tissue" label="% fibrosis"
            <Command name="preset-via_tumor" icon="an_tumorcells.png" hint="Tissue" label="% viable tumor cells"
                pma-classification="Viable tumor cells"

The pay-off of this kind of configuration is two-fold: the protocols are consistently executed as intended. In addition, the training needed for novel users to interact with the software is reduced because they only have a few options to choose from. Less can definitely be more in these scenarios.

Configuring the panel layout organizes its content across different panels. The layout of is a lot more powerful and flexible than in PMA.view 1.x.

All panels can be turned on or off via the Configuration tab on the ribbon. In addition, the admin console can be used to pre-determine the layout and panel organization when people first log in.

In its simplest configuration, you’re navigating slides and looking at them one by one:

You can add various information panels as you see fit to get to something that looks more like this:

Like with the ribbon customization and pre-defined annotation controls, you can pre-set’s panel layout. Again, you’re looking at XML snippets.

The layout from the latter screenshot is then defined as follows:

Iframe panels

A special type of panel is the iframe-panel. With these panels, you can virtually load any website or page that you want within a pre-defined panel. You can define your own panel through the Layout configuration button on the ribbon: ribbon

A pop-up dialog appears and let’s you specify the title of the panel, and the URL you want it to load. custom panel

In our example we took  a promotional video for our own PMA.slidebox product (more on that later), but you can put just about anything in there.

The result looks like this: custom panel

You can pre-define these custom iframe-panels any way you want through the admin console.

<Component name="IFrame" label="Pathomation video" url=""/>

Respective parameters that indicate the current state of the active viewport in are passed along automatically, like this:

Last but not least: a product pitch

We wrote earlier about PMA.slidebox.

This week, we’ve officially started promoting this product. We updated the landing page for PMA.slidebox, and the first demonstration videos are available through our YouTube channel.


Like with everything we do, we’re very excited about being able to package a ton of useful functionality in nevertheless compact product offering. Do have a look at our demo portal for PMA.slidebox and let us know if this something you, too, are interested in, too!

How to handle slide meta-data?

What is slide meta-data?

In order to know to interact with something, we first need to know what it is. Or, at least, we need you to know what it means for us. Just so we’re clear what we’re talking about.

For the sake of this article, we distinguish three different kinds:

  • Intrinsic meta-data: information stored within a slide’s file format. A trivial example is the slide’s pixel size, and a more advanced feature is the time it took the scanner to produce the slide.
  • User-captured meta-data (forms): Pathomation’s central tile server component, PMA.core, allows for the definition of table structures. Forms are a basic data structure in PMA.core that’s picked up again and used to capture and present user-attributed data in other several Pathomation platform components, including, PMA.slidebox, and PMA.control.
  • External data: if you have tons of external data in a separate repository and want access to it through the Pathomation software platform for digital pathology and virtual microscopy, you can link to it in real-time.

Below you can find an overview of what kind of meta-data is supported by what component and in what capacity

  Slide info Forms External data
PMA.start Read Not supported Not supported
PMA.core Read Define / read Read
PMA.view Read Not supported Not supported Read Read / write Read
PMA.slidebox Read Read / write Read
PMA.control Read Read / write Read

Slide info

Intrinsic slide meta-data can be shown in PMA.start by clicking on the filename in the viewport.

In the upcoming versions of both PMA.view and, we provide a separate info-panel for this.

There can be more slide info available than is actually shown in our various front-end interfaces. Some information is really specific, too specialized, or irrelevant even to show to most users. Other fields are scanner-specific and don’t make sense to include on a systematic basis.

At the API and SDK level, we provide dedicated SlideInfo calls that return full hierarchical dictionary structures that do give exhaustive slide information and can be consumed as you see fit for your specific application or workflow.

Data with a twist

Forms are defined in PMA.core through the form editor.

PMA.core’s forms support trivial datatypes like text or numbers (of course). We also offer scientifically relevant twists on traditional data capture. An example is that numerical data fields allow the option to be recorded as “below detectable limit”, since it’s very hard to prove that something is not present in a sample. Oftentimes, a numerical zero just means that our detection apparatus just lacks the sensitivity the unmask the presence of a specific phenomenon.

What can you do in forms? Pretty advanced things. We can model the CAP-recommended cancer protocols as PMA.core forms.

PMA.core offers different ways to interact with forms, but it doesn’t provide a data entry module. That’s because it doesn’t really make sense to enter data at this level. Data entry is done in the context of an application or workflow. Our upcoming will support data entry, and PMA.control offers several interaction modes. Underneath, PMA.control interaction modes rely on data stored as PMA.core forms.

You can ask PMA.core to generate a spreadsheet template based on a select folder of slides and a particular already defined form.

At Pathomation, we pride ourselves to be a truly open platform, so PMA.core offers several data formats to export captured form data to, including CSV, XML, and ARFF.

Use cases for external data

Imagine that you organize a toxicology experiment with a rodent population. In a separate database, you’ve been keeping data about each specimen’s vital statistics, dietary and behavior observations, as well as their phenotypic expression and genotypic make-up. Observations happen daily, so that’s 1750 new records per week.

You take weekly biopsies of the animals to monitor their response to a new drug you’re testing. The biopsies are prepared and stained in triplicate (good for a total of 750 new slides per week), and each slide has a barcode that can be used to trace back to the original individual animal. Your slide scanner takes care that the barcode is encoded in the slide’s filename.

Both the slide population and the separate database keep evolving. Replication between your database and a PMA.core form is considered at some point, but deemed inefficient and error-prone, because we are talking about experimental and evolving data structures here.

The solution is to define an external data source in PMA.core. This goes in two steps: first a connection string is defined to connect to the database server.

Next, the external connection is used to formulate any number of queries against.

Data can be previewed within PMA.core, and subsequently is automatically propagated to other environments like

External data are everywhere. They can be in proprietary databases as in our example above, but they can also be in a (AP)LI(M)S, VNA, PACS, or EHR system. In all those cases, replication is hard and impractical at best, and can even lead to data inconsistencies and errors at worst.

The Pathomation platform for digital pathology and virtual microscopy allows for a more elegant solution.

What WSI data are REALLY made of

Image data types

Pathomation is concerned with any (imaging) data that is microscopy- or pathology-related. Much of the data is large: we talk about gigapixel data, or (more apt for microscopy) whole slide images (WSI).

Not all image data accessed via Pathomation need be large though:

  • Microscopic images can represent individual fields of view, specific areas of interest captured by a mounted camera and for discussion or other ad hoc purposes
  • Pathology starts from physical tissue. Therefore, it is often useful to photograph the obtained tissue. These are typically referred to as “macroscopic images”.

Oftentimes the above results can be stored in common (image) file formats: JPEG and TIFF are most often encountered. You can distribute these images via any medium. But if you have them side by side with your high-resolution images representing prepared slides, it’s reassuring to know that with Pathomation software you can organize these different slides under one umbrella.

Back to the really big data. Elsewhere on this blog, we have an article about the technical challenges when wanting to store an image that contains 100,000 x 200,000 pixels.

We send said article to (potential) customers regularly. Some are helped by it, some not. Because, understandably, many times you just want to know about your slides. When you book a flight, you don’t want to get an explanation about Newton’s or Bernouilli’s laws either… Just get me the tickets please.

So why did we write the article in the first place? Because at Pathomation, we pride ourselves at being format-agnostic. We don’t pick hardware vendors. Each scanner has their pros and cons. Each comes out to server a specific market segment and works better with some tissues than others. It’s not our place to subjectively decide who’s better or worse (although we would appreciate better feedback from some with respect to a vendor’s specific file format).

So please do understand that mentioning specific vendors has nothing to do with any positive or negative endorsements for said vendor. We’re merely stating facts with respect to the way how their techies have long time ago decided to organize their (giga)pixels.

Single or many files

As you already know from the previous article, virtual slides can consist of many different files. Within the files that represent MRXS slides, you find plenty of .dat files, for VSI slides you find .ets files (amongst others) etc.

Several vendors have adopted a single file format for their WSI data. These include Hamamatsu (NDPI), Aperio (SVS), Leica (SCN), and Zeiss (CZI and ZVI).

Other vendors that have adopted a multi-file approach include 3DHistech (MRXS), Olympus (VSI), and Motic (MDS).

The organization of data across multiple files for different vendors is not standardized. In the case of 3DHistech, individual files more or less represent different magnifications. Olympus’ file structure seems more organized around scanned regions of interest.

The single file formats can also be upgraded to multi-file formats. Hamamatsu scanners can create .ndpis files for fluorescent or z-stacked content. The “s” stands for “set”: the .ndpis file merely contains pointers to individual .ndpi files which then each contain the image for the particular layer or channel. You can open these .ndpi files by themselves by the way, but they’re only useful when you also correctly interpret their context from the .ndpis file.

Aperio SVS files can be accompanied by .xml files, which contain annotations. Ventana BIF files can be accompanied by .tifp and .bmp files.

As soon as you have more than 1 file involved, it becomes a multi-file file-format. The strict distinction is important, because some storage systems don’t support multi-file containers.


Let’s look at the 3DHistech line of scanners: In the screenshot below we scanned two slides: HE and PDL1. The software ends created “HE.mrxs” and “PDL1.mrxs” files, along with “HE” and “PDL1” subfolders.

Within the subfolders, a standard naming convention it used, so you won’t find any more “HE*” files in there.

The .mrxs files themselves are typically just there to allow third-parties to identify these different file-types. Case in point: the .mrxs file can be renamed into a .jpeg file, and then you can just view it as any other image (it’s a quick way for us, too, to display slide thumbnails).

Another example? You got it!

This is what the structure of a .VSI slide looks like:

Each scanned region translates into a “stack”; each stack can contain one or multiple frames.

A different approach, a more hierarchical structure.

For DICOM slides, all related files are grouped together. No subdirectory required:

The details of these for the typical end-user don’t matter, except that they do matter when you want to copy / move/ transfer slides to others. The basic principle here to always make sure you not copy the index .vsi / .mrxs / .whatever file, but also all the accompanying data-files. Zipping them may help both you and your receiving party.

How Pathomation can help

Have a look at our [slides] folder in the Windows Explorer:

Confusing, right?

Now let’s have a look at the same folder through PMA.start eyes:

Since Pathomation knows slides, we systematically hide the intricacies of respective formats that you shouldn’t to worry about. In PMA.start it becomes obvious which subfolders are true subfolders, and which ones are merely there to support vendors’ data structures.

This doesn’t help you yet to transfer slides of course, but if you are one of our commercial users, you typically want to transfer slides from your local system to PMA.core and back. If that is you, then PMA.transfer is a great free tool (it’s part of your license package) for you to look at.

This is how our [slides] folder shows in PMA.transfer:

PMA.transfer seamlessly interfaces with PMA.start, and uses it as a jumping board to transfer slides between different endpoints. The biggest benefit of PMA.transfer therefore is that it encapsulates format-specific complexities and hides them from the end-user. Through PMA.transfer, you’re truly manipulating slides instead of files.

In addition, PMA.transfer also makes sure that only correct slides are transferred (nothing more frustrating than transferring 2 GB of data over Wifi, only to discover that the source was corrupt), as well as confirming that the transfer was completed successfully (probably the second most common source of frustration in the endeavors).

Why it matters

At Pathomation, we take much care designing our components in such a way that data duplication can be avoided at all costs. In PMA.control e.g., you can create as many cases and case collections as you want, but the data always remain in the same location. In PMA.core, you can create nested root-directories, each with different ACL properties. To your end-users these end up looking like different filesystems, but they’re really not. At the API level, we provide the possibility to fingerprint a slide, so you can scan for duplicate files and possibly eliminate them. All of these measures matter when you’re talking about Terabytes of data.

But there comes a time when you do need to copy slides. Perhaps you’re moving them from one installation to another, or there’s a network upgrade, or you just want to ship off some slides to a colleague (My Pathomation can now do this for you, too).

Whatever the case, (virtual) slides will need to be moved around. By their nature, they are big. It helps to have some understanding about how they are structured at that point.

And now you know…

… everything about whole slide images, or, at least, almost everything.

Amongst other things, .mrxs-, .vsi-, and other files help companies like ourselves decide what kind of file format is being used. The alternative would be that we find a large number of subfolders, and we have to parse each and everyone of those folders in a variety of ways to try to “guess” what file format it belongs to (if this is even the case at all; a subfolder can still be a regular subfolder, containing no slides at all). This would be a tremendous drag on system performance.

Whether you deal with Pathomation or another vendor, we hope this article has helped you take a peek behind the curtain of what whole slide images are made of, and how to best work with them.

Of course, if you are using the Pathomation platform, you should have a look at PMA.transfer, a no-hassle tool we developed to facilitate slide transfers.

If you’re not yet a Pathomation customer (whaaaaaaaaaat??), you can contact us for a free no-obligation demonstration.

I “just” want a slide catalog

The case for simplicity

Sometimes user requests are simple. In this particular instance, we had a customer that “just” wanted a list of all of their slides (within a particular root-directory).

We pointed them to the repository of PMA.control:

We pointed them to the tree interface that combines folder and slides in PMA.view:

But as it turned out, these were too complicated. The customer already had built a nested hierarchy of folders and subfolders, but now wanted a linear list of all slides across all folders. A thumbnail next to each slide reference would also be useful, thank you very much.

A linear list of slides

Here’s how we can create a linear list of slides:

from pma_python import core
from datetime import date
from os import mkdir
from os.path import exists


slides = core.get_slides("C:/slides", recursive=True)
print(len(slides), " slides found")     # sanity check

f = open("c:/wsi_report/cat1.html", "w+")
f.write("Market slide catalog created on " + str( + "")
for slide in slides:
    f.write(slide + "
") f.write("") f.write("") f.close()

Want to include the thumbnail? Look no further than the get_thumbnail_url method:

f = open("c:/wsi_report/cat2.html", "w+")
f.write("Market slide catalog created on " + str( + "")
for slide in slides:
    thumb = core.get_thumbnail_url(slide)
    f.write("" + slide + "
") f.write("") f.write("") f.close()

Ah crap, that looks horrible!

No worries; just add some formatting to the ole’ <img> tag:

f = open("c:/wsi_report/cat3.html", "w+")
f.write("Market slide catalog created on " + str( + "")
for slide in slides:
    thumb = core.get_thumbnail_url(slide)
    f.write("" + slide + "
") f.write("") f.write("") f.close()

Yes, something like this:

Much better!

But there’s a catch here: careful observers notice that the thumbnail URLs used by the above code have a PMA.core Session ID embedded. This means that those URLs are only valid as long the respective Session IDs remain valid.

This is fine for ad-hoc reporting, but if we want a list that we can post somewhere on a central server as a reference source for others, we need something just a little more sophisticated. Yes, the keyword in this post is “just”, just in case you’re wondering.

Creating a persistent slide catalog

We want to create a list of all slides in our repository, and we want to list to be persistent. In other words, it is not ok to walk away from our browser for a couple of hours, refresh our list, and see the following:

The solution is to not use the thumbnail URL, but retrieve each thumbnail as a binary object and save it to a subfolder. Then, we let our <img> tag point to the downloaded (or cached, if you will) files.

dir = "c:/wsi_report/cat4/"
if not exists(dir): 
f = open("c:/wsi_report/cat4.html", "w+")
f.write("Market slide catalog created on " + str( + "")
for slide in slides:
    thumb = core.get_thumbnail_image(slide)
    fn = core.get_slide_file_name(slide) + ".jpg"
    thumb_fn = dir + fn
    print("Saving thumbnail as ", thumb_fn)
    f.write("" + slide + "
") f.write("") f.write("") f.close()

This catalog fits our needs. We can post it anywhere, and it only relies on local data. You can store it on your hard disk. You don’t need PHP, you don’t need Python, you don’t need the underlying PMA.core (PMA.start in our example code) to be up and running.

There are cons to our approach, too:

  • The catalog takes longer to generate, as we need to download a large number of thumbnails one by one
  • The catalog takes up more space. In our case, for 265 slides, we went from 197 KB to 20+ MB. That’s still manageable to send in a zipped file package, but for larger repositories may become inconvenient.
  • The catalog is a snapshot of the repository. If you use as a reference today and add slides to the underlying root-directory tomorrow, the catalog will not pick up the newly added (or removed, for that matter) slides, unless you re-generate the catalog and re-distribute or publish it.

And finally

Every solution has trade-offs, and even for simple problems, you have think through things to come to the right solution. Problems that have the word “just” somewhere in their formulation can be particularly trickly.

The sourcecode for this post is available through our repository as realdata 038 – simple slide catalog.ipynb. Feel free to download it and adjust it for your own needs.

As always: we encourage interaction. Do let us know what digital pathology problem or scenario you want us to work out in one of our next posts!

Optimizing your data volume

Pop quiz

Let’s start this post with a pop-quiz. Without peeking ahead; can you identify what painting it is?

The answer of course is that it’s this one. The question is relevant, because you just proved to yourself that you can get by with very little information to identify something big and important. So let’s just walk you through the way we created the picture above:

  • We took the original image from Wikipedia, provided through the WikiMedia Commons library, at 585 x 870 pixels (a total of 508,950 pixels). The physical size of this image was 957 KB.
  • We then resized this image to thumbnail size of 27 x 40 pixels (a total of 1,080 pixels, or a reduction by 99.8%). The physical size of this image was around 3 KB (a reduction by 99.7%).
  • We then blew up the thumbnail again with a factor 10 to 270 x 400 pixels (bringing the total amount of pixels again to 108,000 or about 21% of the original). Interestingly enough, the image size this time is 75 KB, which represents only about 8% (even though we supposedly offer 21% of the original pixels).

The important takeaway is that the human brain is remarkable in filtering out and handling noisy, blurry data.

But what does this have to do with digital pathology or virtual microscopy?

A bottleneck

Recently we were contacted by a company that was involved in telepathology in rather rural areas. It’s a great endeavor: they provide local staff with refurbished scanners, thereby hoping to extend advanced pathology expertise to communities that have no other way of having access to these.

But as anybody working with whole slide images can attest: they’re big. Huge sometimes, And there’s no way around this. And it’s a problem for our contact because rural areas unfortunately oftentimes also still mean limited bandwidth. It can take hours to transfer a single slide, and that’s just not practical.

This is where our earlier exercise comes into play. And the question now becomes: How much information do you really need to make an accurate diagnosis?

If we scan a slide at 40X (0.25 ppm according to DICOM’s definition) and we brought it back down to 20X (0.5 ppm), would it be sufficient for a pathologist to make an accurate diagnosis? And what about image compression? Do we need 100%? What if we transferred the image with 90% quality? 80%? We know that at 50% compression artifacts will not result in a pretty image, but the real question is whether the receiving pathologist can still make a diagnosis.

Some research has done on this: Yukako et al concluded that even significant compression ratios have no impact on diagnostic accuracy.

So with that being said, how could you do it? And how would you do it with the Pathomation platform?

To Jupyter, capt’n!

A whole slide image has two properties that we can manipulate to control the physical slide size:

  • The highest zoomlevel contained in the slide (see also our blog post of WSI structure)
  • The compression ratio of individual tissue tiles

We can then create a jupyter script can takes in these two parameters, to convert any given slide to an intermediate format. We choose TIFF here, but you can adapt the output to your own needs.

The first one is target-quality, and can be given a value of 0-100 (100 being the highest quality).

The second one is the downscale factor. It can be chosen from a list of values of [1, 2, 4, 8, 16, 32, 64, 128]. That doesn’t quite translate to an optical magnification or pixels per micron reading, but we use it here for the sake of simplicity (it has to do with the way these WSI data are typically structured). The higher the downscale factor, the less magnification and detail you will end up with. If you want to, you can write your own conversion method from downscale-factor to ppm and back again.

After setting the parameters, we create new TIFF file using the GDAL TIFF driver. The width and height of the final tiff is based on number of tiles horizontally and vertically.

We read each tile of the final zoomlevel (1:1 resolution) from the server and write it to the resulting TIFF file Then we create the pyramid of the file using BuildOverviews function of GDAL

The complete Jupyter notebook is available from our website.

The jury is in session

Let’s see what the result of our code is for different combinations.

We downloaded CMU-1.svs from the OpenSlide sample data repository, which has a compression quality of 30%. The original file size is 169 MB.

When we run our script and ask to generate a derived TIFF slides with 100% tile quality and the same magnification level, we find something interesting: the new slide is about 1.5 GB is slide. It became bigger!

This is our first lesson then: when doing these kind of conversions, it doesn’t make any sense to transfer data at a compression rate that’s lower than the original slide’s compression.

The CMU-1.svs slide is an extreme case. We haven’t heard of any labs that are scanning at only 30% quality; most scanners are usually calibrated to produce data at 70%-80% compression quality.

That being said, let’s just use the 1.5 GB as a reference point and see how the data package becomes smaller as we vary both the compression quality and the downsampling parameter:

Not surprisingly, as the scaling factor goes up (a scaling factor of 4 would roughly correspond with a perceived magnification of 10X), the resulting slide size becomes significantly smaller.

So in both axes, significant saving are to be had. Similar to the Mona Lisa painting example that we started with; the slide at 10X and 60% quality is only 2.3% the filesize of the original 40X and 100% quality file.

What else?

We wrote this article to give you ideas on how you can employ PMA.start to optimize your data volume when shuttling back data back and forth between two sites.

This is but one scenario to consider. Is the eventual resolution sufficient to maintain diagnostic accuracy? That’s up to you and your pathologist(s) to resolve. Whether the parameters specified here work for you, is a question you can only answer.

To give you an idea; here’s the slide with a downsampling rate of 1, and 100% compression quality:

Here’s the same slide with a downsampling rate of 4, and only 60% compression quality:

There is one more interesting experiment to consider here: storage for digital pathology doesn’t scale very well, and we would be interested to see whether machine learning algorithms (ML/DL/AI) could be trained equally well on heavily compressed data, compared to uncompressed data.

Full disclosure here: There are things missing here, too. We’re not incorporating the thumbnail image in the exported TIFF, in similar fashion as certain other vendors do this. It’s possible, but again, the eventual implementation depends on your personal preference and circumstances. If you’re looking for help and advice with this, we can assist you and you may contact us at

More business intelligence

The story of the three Qs

In this article, we explain how Pathomation was recently able to assist its one of its customers with Performance Qualification tests for a new slide scanner.

At our customer’s lab, each piece of equipment that they take into operation, goes through a rigorous qualification pipeline before putting it to work for day-to-day lab activities.

This goes for slide scanners as well.

The qualification pipeline (the “validation procedure”) consists of three steps:

  • Installation Qualification (IQ)
  • Operational Qualification (OQ)
  • Performance Qualification (PQ)

The first step is straightforward: you’re asking yourself if the scanner can scan your slides or not. Is the type of glass slide that your lab use, in combination with the coverslip, compatible with the new scanner? Is the barcode on the slide label scanned properly, too? What about (automated) tissue detection?…

The second step is operational qualification: can the new equipment handle the edge parameters of your day-to-day operations? If you plan to feed the scanner up to 400 slides per day; will that work?

The third step in the validation can be somewhat subjective, and it’s here that Pathomation’s software platform came in particularly handy.

Performance Qualification

Performance Qualification (PQ) is a test procedure that takes place in order to verify if a new piece of equipment is good enough to be put to work in the daily workflow of a company. It happens after the hardware has already gone through IQ and OQ.

Our customer recently wanted to know how their various scanners compare to a new one in terms of speed. They took a representative set of slides, divided it in three groups (“requests”), and then put each scanner to work.

their various scanners compare to a new one in terms of speed. They took a representative set of slides, divided it in three groups (“requests”), and then put each scanner to work.

In the vendor’s viewer software, one of the parameters that could be seen was “scanning duration”. Considering the side of the dataset however, opening each slide individually in the viewer software, noting down the scanning time, recording it into an Excel sheet etc., would have been a very tedious task.

So they turned to Pathomation for help: Can Pathomation’s software be used to create a table with all scan duration values for all scanned slides?


Pathomation’s API offers a get_slide_info() call. We used it in our first article on business intelligence to extract specific bits of information with it. The method be default returns a nested hierarchy of information.

But not every scanner exports the same information. Some of the slide information we expose ourselves through ImageInfo, some not. This is because vendor 1 exposes (A, B, C, D, E), vendor 2 exposes (A, B, F, G, H), vendor 3 exposes (A, B, C, I, J, K, L) etc. Therefore, Pathomation offers the common denominator information only, something like (A, B, C).

In order to maintain some standardization across the different vendors, part of the returned information by get_slide_info is a MetaData array that contains key-value pairs that may or may not be provided by your scanner vendor.

As it turns out, we didn’t have the scanning time in there yet, but because of the already provided structure, it was straightforward to add it.

Like our first business intelligence exercise, we then write a wrapper method to extract the scanning duration as we need it:

def get_scanning_duration(slide_ref):
    info = core.get_slide_info(slide)
    meta = info["MetaData"]
    for meta_el in meta:
        if (meta_el["Name"] == "ScanningDuration"):
            return meta_el["Value"]
    return -1

The remainder of the script is straightforward:

  • Loop over the different scanner output
    • Foreach scanner (“request”), get all the slides (recursively in our case)
      • Foreach slide, extract the scanning duration
  • Wrap all output into a Pandas dataframe structure
  • Export the Dataframe to a spreadsheet

A word about that last step. On occasion, people have called us old-fashioned for this one. Surely Excel is spread wide and far enough by now so that it can be considered a de facto “standard” file format, too, can’t it?

I disagree. I still prefer to use csv instead of Excel. Why? Because csv files are simple, and transportable to many other platforms and applications. Our data in this case consists of a single table with three columns. It’s simple. We don’t need a complex data format to store this kind of data.

Generating fancy file format output is not part of the assignment here. Keep It Simple.

Our final code looks like this and can be downloaded as a Jupyter notebook, so you can play around with it yourself.

server = "http://***/***"
user = "***"
pwd = "***"
print("Session initiated ", core.connect(server, user, pwd))
requests = ["RQ105", "RQ204", "RQ695"]
base_folder = " Images/pq"
print(len(core.get_directories(base_folder)), " subfolders detected in root base folder", base_folder)
s_times = []
for req in requests:
    print(base_folder + "/" + req)
    for slide in core.get_slides(base_folder + "/" + req, recursive=True):
        s_times.append({"slide": str(slide).replace(base_folder, ""), "scan_time": get_scanning_duration(slide), "request": str(req)})
scan_times = pd.DataFrame(s_times, columns=["request", "slide", "scan_time"])
scan_times.to_csv("scanning duration.csv")

What about the results?

We can import the resulting CSV file in Excel and see what it looks like:

As we’re interested in comparing the different scanners to one another, one direct way to do this is with a pivot-chart.

These are the results. What do you think? Do all three scanners perform equally? Does the new scanner pass performance qualification (PQ)?

Your challenge here

This is how Pathomation works with its customers.

Our philosophy has been and remains to develop local, and then scale as you handle more complex scenarios. You do not need the commercial version of PMA.core to get to work with the code in this post. The Jupyter notebook that comes with this blog post is suited for use for both PMA.start and PMA.core.

Do you use Pathomation software in your daily workflows already? Tell us your business intelligence challenge and perhaps we’ll address is in an upcoming post!

A look at PMA.slidebox

So what if you “just” want to show your slides?

Education is one of the main application domains of digital pathology. And there are many instances where you just have a couple of slides that you want people to look at. When Pathomation first became involved in deploying its software for facilitating seminars, we used PMA.view.

But while it’s possible to do this, PMA.view is not a good solution for this particular problem:

  • People still need to login in PMA.view
  • PMA.view requires telling people where to navigate to (which root-directories / paths); your root-directories on the PMA.core side of things may not exactly reflect the content that you want people to see.
  • PMA.core is folder-based navigation, and PMA.view is too. This means that the concept of a case (a group of slides belonging to a patient or experiment) is not intuitively represented
  • neither PMA.view nor PMA.core support any of the visual cue-elements that we’ve all gotten accustomed to in recent years such as avatars.
  • The learning curve of PMA.view is still too steep for people that just need to look at slides. PMA.view is overshooting for what you want people to actually experience

How did people used to do it? They traveled to conferences with a slidebox in their hand luggage. Within the slidebox; neatly organized slides, sorted by case. We don’t want to be sensational here and say that the slides got lost all the same, or that they broke all the time, or got confiscated by security in a post-9/11 world (glass slides + ninja pathologist = impromptu shiroken?).

But: things could happen when traveling with physical slides, and the most likely issue was probably still somebody forgetting to take their slides with them in the first place!

Also: when traveling with physical slides you depend heavily on the organization’s talent of providing and calibrating multi-headed microscope equipment. As more people attend, aligning all the optics of these becomes ever harder, and for large groups this is just impractical.

Cue PMA.slidebox

So we got thinking… If people are used to physical slideboxes, why not just make a virtual slidebox? This is exactly what PMA.slidebox is and does!

PMA.slidebox, like all our software, relies on PMA.core. It means that you can use all the great features of PMA.core (different root-directories, access control), without having to explain it to your audience.

All your audience sees, without to register or login or need to install or download anything (zero footprint) is this:

So how does it work? PMA.slidebox shows up to four collections in the top-left corner of the screen (screenshot only shows three). When you select a collection, you see the “cases” appear underneath it, along with the slides for each “case”.

We put the word “case” between quotes deliberately because you don’t have to set it up this way. You can have a simple list of slides without any hierarchy or structure to it, and PMA.slidebox will pick it up. Similarly, if you only have a couple of cases, you could turn those into individual collections, and present them as such.

What you want to show and how you want to show it is completely up to you. It’s just like a real slidebox: you put the slides in that you want, and you organize them the way you want them, too.

Want to give it a go yourself? Here’s an example how such a virtual slidebox works in practice:


PMA.slidebox is flexible. It is hosted on a website, somewhere (can be on your infrastructure, or on ours). If you don’t have PHP, we can configure it for you; but if you do, you can configure everything by yourself via a configuration panel.

What you first need to do is decide how you want to have everything structured. You can build a hierarchy of up to three deep, with the following levels:

As you have only 4 cases in the above screenshot, you could simplify your hierarchy like this:

While PMA.slidebox is flexible, we should point out that it is necessary to have some kind of hierarchy at least. You cannot just dump all your slides into a single folder and expect the software to figure it out from there.

Note: if you do have large repositories of slides, and you want structured case creation and organization, you should have a look at PMA.control.

Here’s what the setup looks like when your spread your cases across only two collections:

And here’s what that same group of slides looks like, but this time with each case being defined as its own separate collection:

In closing

With our PMA.slidebox product, Pathomation solves the problem of mass-distribution of slide collections. When you just want to share your slides with people in a somewhat organized (collection and cases) fashion, PMA.slidebox is the perfect solution for you. The easy to use configuration panel behind the front-end makes it a breeze to point to the exact content that you want to display, and under no circumstances does the end-user have to do anything else except click on the URL that you provide them with.

Find out more about PMA.slidebox at our website at

Pathomation on the web

Web presence

We want talk a bit more about our different communication channels this week.

When you’re reading this article, you’ve obviously found one of them.

The RealData blog at is a wordpress website that we set up a couple of years ago to allow us to communicate about or explain topics that don’t necessarily have a dedicated place yet on our “main” company website at

Pathomation is a small company, and things can move quickly. We simply don’t have time to re-do our website each month or so because our product offering changes, or because there’s a spurt in creative writing that needs to find a landing spot and reach an audience. A free-form blog then seemed like a good idea.

And we still think it is 😊

There are companies whose website is a blog, but we do think there’s still a need to offer structured information and a general product overview as well.

So while you can’t constantly rewrite your website, we did manage to re-work this month and we’re pretty proud of the result. If you haven’t checked it out yet, go ahead and do so. It’s a lot more comprehensive than anything we’ve had up before.

And if you’ve read this blog, of course you’ve heard about PMA.start before, our free whole slide image / digital pathology viewer software that can be used by anybody for anything to manage their local slide content. Our website is the third axis of our online web-presence strategy.

PMA.start comes with no limitations, except the one that is built-in: you can only use it on local content. If you want to share data with colleagues via a network, you need to upgrade to our professional PMA.core product. If you’re not quite sure what that’s all about, you can still sign up for our beta program until the end of this month (just a few days left, so be quick).

Check out our beta landing page at

And there you have it; our three pronged strategy to provide you, our valued customer and end-user, with background information about the Pathomation universe, and our great products.