November 2022 – Real Data Digital Pathology blog

Our PMA.core tile server is all about providing connectivity, and we talk a lot about how great we are in terms of:

Supporting more WSI file formats than anybody else
Supporting more storage media than anybody else
Being able to integrate different annotations from environments like ImageJ or QuPath
Being able to bring together a great variety of metadata

What we don’t boast about too often though are our audit trailing capabilities. It’s sort of our secret sauce really, that overlays everything that takes place within the PMA.core environment (and by extension really almost all of our other products, too).

The need for monitoring

Yes, we know that monitoring conjures all sorts of “Big Brother is watching you” memes, but there are a number of good reasons to provide this kind of service, too:

When enrolling new users, you want to keep an eye out for them that they can do things as they are intended. Some of us are introverts, and some pre-emptive corrections (if needed) may actually be appreciated.
Too many audit events may be a clue that there’s a security breach in your system, requiring other actions to be initiated.
It’s a general sanity check for hard working staff that can now at the end of the day make sure that they indeed looked at everything they needed to, and that it was recorded as such.
During an audit, or when an outlier event takes place, an audit trail can provide supportive evidence that indeed things did take place as intended.
As a professor, you assign homework cases to your students. Did the student really go look at the assignment?
In some cases, the features of an audit trail can be prescribed in the form of legal obligations (depending on your jurisdiction). One such example are the FDA’s 21CFR.11 guidelines.

An audit trail need not be a mystery, either. In essence, it means that for each change in any data point, you keep track of:

Who did it
When it happened
Where it happened
Why it happened

Correspondingly, if there’s no audit trail record of an operation, it didn’t happen.

Creating content

Let’s see how that is implemented in PMA.core. Let’s create a new user:

After you create the user, the “Audit trail” tab immediately becomes visible. When you click on it, you see that new data was entered.

The same audit principle applies to other information types across PMA.core. For convenience, we sometimes combine different entities in a single report. An example is a root-directory: A root-directory always consists of a symbolic name (the root-directory itself), and one or more mounting points. You can’t have one without the other. So the audit trail for both is combined and shown as follows:

Note that sensitive and private information like someone’s password is still obscured, even at this level.

Editing content

Let’s make some changes to the user’s record:

The audit trail tab shows the changes:

And the same principle applies to all entities, like the aforementioned root-directories. Note that multiple subsequent edits are shown as separate records:

Deleting content

Getting rid of content is probably where the audit trail comes in the most useful.

After deleting a record, you can search for it, and the fact that no results are returned proves that as far as PMA.core is concerned, the record is indeed deleted.

It is possible to transiently see the operation in retrospect, by typing in the direct URL to the entity’s original audit trail:

The red color is used to indicate that something final happened here:

However, when logging out of PMA.core, and logging back it, it’s harder to retrieve the data, as you need to remember to entity’s original identifier, and even if you do, it may be taken over by a new entity.

We’ll show you in a minute how to get access to deleted data in a more reliable and predictable way.

Why would you want to keep track of deleted data? Do you care to find out whether the student did his homework last semester? Probably not, but at least in the context of clinical trials, as well as hospital operations, this makes sense, because:

Clinical trials can run for many years. For rare diseases, the phase I clinical trial can particularly stretch on for a long time. People switch jobs and roles in between, and when final approval of the drug approaches, it’s important to still have a track record of who was involved
Regulatory at the country level often require patient data to be kept for dozens of years. Just as important as it is to keep the patient-data, are the meta-data describing the actions performed with the patient records.

Back-end database

PMA.core only offers audit trail views for the most commonly referenced data types. Whether you can consult the data through the PMA.core end-user interface or not; all operations on any data entities in PMA.core eventually are tracked through a single table structure, which is defined in our wiki:

This means that even if there is no visual interface within PMA.core, or you can’t remember the original URL to the entity’s audit trail, there’s always the possibility to go dig into the audit trail in the back-end:

The above shows how the data from our first user record creation is represented. Below is what the update looks like:

And finally, the delete event:

Scaling and resource allocation

All this extra data means extra storage of course. Microsoft SQL Server can definitely handle a lot of records, but there are still situations where extra care is warranted.

When a lot of data passes through the system transiently, it’s possible for the logfiles (tables) to grow quicker than the rest of the database. Consider that also annotations and meta-data (form data) is audit-trailed.

In order to give some guidance as to how much data there actually is, as well as when it was generated, the installation check view gives high-level statistics on this:

If the number of records in the audit trail increases rapidly, you should be able to explain why this is (many users, lots of annotation activities taking place, lifetime of the total installation…). It’s important also at that point to go through our latest recommendations on SQL Server compliance.

In closing

In an earlier article, we talked about the differences between adapting an open-source strategy versus a commercial platform like our own.

This article adds more substance to this discussion: to be truly prepared for enterprise-level deployment of digital pathology, it’s important to know who’s doing what with your system. It’s important to be able to prove that to the necessary stakeholders, including governments and regulatory agencies.

Pathomation’s PMA.core tile server then has all the necessary infrastructure to get you started off the right foot.

Just as important as having state of the art digital pathology software, are the tools that are built and provided around such infrastructure. Today we release PMA.transfer 2.2.3. PMA.transfer is an important component in the Pathomation software platform for digital pathology when scaling up and automating your slide manipulation capabilities.

Why even build a separate tool like PMA.transfer? Virtual slides are complex in build-up. Any type of data that is stored on a computer comes down to a physical file on a hard disk. Various software vendors have come up with different strategies to store virtual slides as such physical data. A single slide can be stored as a single file, but also as multiple files. This distinction is not always obvious, and often leads to confusion when sharing slides amongst colleagues and collaborators alike

Similar in feel to FileZilla and CloudBerry Explorer, PMA.transfer helps end-users and administrators to manipulate large amounts of virtual slides. The tool allows you to manipulate virtual slides on a transaction-based scale. PMA.transfer is smart enough to figure out what physical files belong to what slides. The end-user only needs to tell the software to “transfer these slides from A to B”, and PMA.transfer takes care of it.

The specific nature of how virtual slides are physically represented on storage media is hidden from end-users. A typical application for PMA.transfer would be to upload recently scanned slides from a local hard disk to the PMA.core image management system (IMS). It doesn’t matter than in this regard whether the final storage destination is a local on-premise RAID device, or (cloud-based) S3 storage.

In addition to providing the necessary abstraction of the underlying complex data representation, much effort went into introducing the necessary checks and balances before, during, and after a transfer operation.

PMA.transfer works in close collaboration with PMA.start, a free local version of PMA.core, our CE-IVD certified tile server. This is necessary because it is PMA.start that is responsible for interacting with the local hard disk. It is this interaction that also allows for the software to perform a slide integrity check before the transfer even begins; there’s nothing more frustrating than to wait for 2 Gigabytes of data to upload, only to have to realize afterwards that the initial slide was somehow corrupt from the beginning.

Slide transfers that are interrupted during the upload (or download) process automatically resume once a connection is re-established. Upon completion, a secondary integrity check takes place, to confirm that a transferred slide’s fingerprint corresponds to the source slide.

PMA.transfer 2.2.3 was updated to interact better with My Pathomation, and is expected to be particularly like by its users. While the My Pathomation user interface is very user friendly to upload slides one by one, it is not that well suited for larger batch-based transfers.

Upon launching PMA.transfer, users can choose to either connect to an on-premise PMA.core instance, or an institutional cloud-based My Pathomation account. For power-users, different connection profiles can be tracked as well, similar to FileZilla’s host manager.

For existing Pathomation customers (including users of My Pathomation), PMA.transfer can be downloaded free of charge from https://www.pathomation.com/pma.transfer.

Month: November 2022

Audit trailing