BTF images that are stored on s3 bucket takes long time to open/render

spatre · August 3, 2022, 11:47pm

Hi Team,

I have a bunch of BigTiff(.btf) images stored on an S3 bucket (~3TB). These images are scanned from VS160 Olympus scanner and are uncompressed. So the size of each image varies between 7GB-24GB.
I have a VM on the cloud that has 30 ocpus, 328GB RAM and 300 GB boot volume. I have configured the S3 assetstore and imported these files into a collection. When I click on the large image icon, it basically takes forever to load the image in Full resolution and if I click back, it kind of hangs. If it does load, it takes over 10-15 mins and that too not in full resolution.

I have also modified the memcached command in the docker-compose.yml as follows to use 300GB of RAM memory.

  memcached:
    image: memcached
    command: -m 300000 --max-item-size 8M -t 20

I then changed Automatically use new items as large images to Automatically try to use all files as large images in the Large Image Plugin.

In this case, I can see the RAM and the disk space on the VM being eaten up until the disk space is full and I can no longer access the UI. I have to reboot. ( But I can view some of the images in Full resolution until the disk space is full).

How do the files get cached/store in the case of s3 assetstore? Does it copy all the images onto the disk? If so where exactly is it copied (in the db/assetstore location or somewhere else)?

What’s the best way to deal with this scenario? Should I add a big block volume to the VM and configure the docker-compose to store assetstore, logs and db on the block volume? We’d like these images to load rather quickly. Any advice would be much appreciated.

Best,
Shiv

David_Manthey · August 4, 2022, 1:19pm

Tiff files have a huge number of ways that images can be stored in them. Ideally, they are stored with the image divided into tiles and into multiple resolutions, which allows efficient reading at any location. But, based on your description, my guess is that these files are stored as a single image block.

Internally, we use a number of different tile sources (libraries) to read different files. Probably the BTF files are being read via the tifffile library (but maybe the bioformats or gdal library). Whichever library is reading this is reading the whole image to return just part of it, hence the memory spike.

Maybe the btf files would read efficiently with a different tile source, in which case it would be a matter of listing that tile source as a higher priority for btf files. Or, they might be inefficient without some explicit code.

There are a couple of ways to try things out, depending on your comfort with Python or Mongo. If you have the internal metadata turned on on the item page in Girder (there is a setting in the large_image plugin settings to do this), you would see which tile source is being used.

If you can share a file (or the output of tifftools dump <path to file>), then I could state more definitively what is going on.

– David

spatre · August 4, 2022, 8:19pm

Yes Absolutely. Please find below the metadata using the large_image plugin for one of the files.

And also the tifftools dump output for one such file.

-- hu_prostate_01.btf --
Header: 0x4949 <little-endian> <BigTIFF>
Directory 0: offset 5474963136 (0x146554ec0)
  ImageWidth 256 (0x100) LONG: 45048
  ImageLength 257 (0x101) LONG: 40466
  BitsPerSample 258 (0x102) SHORT: <3> 8 8 8
  Compression 259 (0x103) SHORT: 1 (None 1 (0x1))
  Photometric 262 (0x106) SHORT: 2 (RGB 2 (0x2))
  Make 271 (0x10F) ASCII: Olympus Soft Imaging Solutions
  Model 272 (0x110) ASCII: VC50
  StripOffsets 273 (0x111) LONG8: <633> 507 8649723 17298939 25948155 34597371 43246587 51895803 60545019 69194235 77843451 86492667 95141883 103791099 112440315 121089531 129738747 138387963 147037179 155686395 164335611 ...
  Orientation 274 (0x112) SHORT: 1 (TopLeft 1 (0x1))
  SamplesPerPixel 277 (0x115) SHORT: 3
  RowsPerStrip 278 (0x116) LONG: 64
  StripByteCounts 279 (0x117) LONG: <633> 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 8649216 ...
  XResolution 282 (0x11A) RATIONAL: 72 1 (72)
  YResolution 283 (0x11B) RATIONAL: 72 1 (72)
  PlanarConfig 284 (0x11C) SHORT: 1 (Chunky 1 (0x1))
  ResolutionUnit 296 (0x128) SHORT: 2 (Inch 2 (0x2))
  DateTime 306 (0x132) ASCII: 2021:06:06 08:59:25
  Artist 315 (0x13B) ASCII: VS120
  33560 (0x8318) LONG8: 5474963016
  EXIFIFD:0
    Directory 0,EXIFIFD:0,0: offset 5474961920 (0x146554a00)
      ExposureTime 33434 (0x829A) RATIONAL: 103 50000 (0.00206)
      ExifVersion 36864 (0x9000) UNDEFINED: <4> b'0210'
      CreateDate 36868 (0x9004) ASCII: 2021:06:06 08:59:25
      ApertureValue 37378 (0x9202) RATIONAL: 3 4 (0.75)
      SubjectDistance 37382 (0x9206) RATIONAL: 3 5000 (0.0006)
      MeteringMode 37383 (0x9207) SHORT: 1
      FlashpixVersion 40960 (0xA000) UNDEFINED: <4> b'0100'
      ColorSpace 40961 (0xA001) SHORT: 1
      PixelXDimension 40962 (0xA002) LONG: 45048
      PixelYDimension 40963 (0xA003) LONG: 40466
      ExposureMode 41986 (0xA402) SHORT: 1
Directory 1: offset 5474992648 (0x14655c208)
  ImageWidth 256 (0x100) LONG: 27
  ImageLength 257 (0x101) LONG: 25
  BitsPerSample 258 (0x102) SHORT: 16
  Compression 259 (0x103) SHORT: 1 (None 1 (0x1))
  Photometric 262 (0x106) SHORT: 1 (MinIsBlack 1 (0x1))
  Make 271 (0x10F) ASCII: Olympus Soft Imaging Solutions
  Model 272 (0x110) ASCII: VC50
  StripOffsets 273 (0x111) LONG8: 5474988870
  Orientation 274 (0x112) SHORT: 1 (TopLeft 1 (0x1))
  SamplesPerPixel 277 (0x115) SHORT: 1
  RowsPerStrip 278 (0x116) LONG: 64
  StripByteCounts 279 (0x117) LONG: 3456
  XResolution 282 (0x11A) RATIONAL: 72 1 (72)
  YResolution 283 (0x11B) RATIONAL: 72 1 (72)
  PlanarConfig 284 (0x11C) SHORT: 1 (Chunky 1 (0x1))
  ResolutionUnit 296 (0x128) SHORT: 2 (Inch 2 (0x2))
  DateTime 306 (0x132) ASCII: 2021:06:06 08:59:25
  Artist 315 (0x13B) ASCII: VS120
  EXIFIFD:0
    Directory 1,EXIFIFD:0,0: offset 5474992408 (0x14655c118)
      ExposureTime 33434 (0x829A) RATIONAL: 103 50000 (0.00206)
      ExifVersion 36864 (0x9000) UNDEFINED: <4> b'0210'
      CreateDate 36868 (0x9004) ASCII: 2021:06:06 08:59:25
      ApertureValue 37378 (0x9202) RATIONAL: 3 4 (0.75)
      SubjectDistance 37382 (0x9206) RATIONAL: 3 5000 (0.0006)
      MeteringMode 37383 (0x9207) SHORT: 1
      FlashpixVersion 40960 (0xA000) UNDEFINED: <4> b'0100'
      ColorSpace 40961 (0xA001) SHORT: 1
      PixelXDimension 40962 (0xA002) LONG: 45048
      PixelYDimension 40963 (0xA003) LONG: 40466
      ExposureMode 41986 (0xA402) SHORT: 1

If you need anymore info, please do let me know. Thanks in advance for your time.

David_Manthey · August 9, 2022, 3:48pm

In looking at this, the problem is that the BTF file is untiled (stored as strips rather than tiles). All of the readers we can use (libtiff, libvips, bioformats, etc.) all read the whole file even to extract a small area.
Some readers (tifffile) are confused by the tiny secondary image.

One approach would be to convert the files from their current format to a web-optimized format (the item/{id}/tiles/convert endpoint can do this, as can the large-image-converter command line tool). This will mean having two copies of your files (the original and the web-optimized version).

The second approach would reduce memory usage but may not speed up initial viewing. This would be to modify our tiff reader (or write a different tile source) to better take advantage of the fact that the file is uncompressed. This will still be slow for the initial load, as the only way to produce low-resolution versions of the image for display is to read the whole image.

spatre · August 23, 2022, 5:14am

Thank you for the suggestions David. They were very helpful. I tried using the item/{id}/tiles/convert endpoint and converted it to a pyramidal tiled tiff and that seemed to greatly improve the speed at which the image loads.

The output of magick identify command on the converted image:

testoutput[0] TIFF64 50575x38782 50575x38782+0+0 8-bit sRGB 7.35708GiB 0.000u 0:00.008
testoutput[1] TIFF64 25287x19391 25287x19391+0+0 8-bit sRGB 0.010u 0:00.005
testoutput[2] TIFF64 12643x9695 12643x9695+0+0 8-bit sRGB 0.010u 0:00.004
testoutput[3] TIFF64 6321x4847 6321x4847+0+0 8-bit sRGB 0.010u 0:00.003
testoutput[4] TIFF64 3160x2423 3160x2423+0+0 8-bit sRGB 0.010u 0:00.003
testoutput[5] TIFF64 1580x1211 1580x1211+0+0 8-bit sRGB 0.010u 0:00.003
testoutput[6] TIFF64 790x605 790x605+0+0 8-bit sRGB 0.010u 0:00.002
testoutput[7] TIFF64 395x302 395x302+0+0 8-bit sRGB 0.010u 0:00.002
testoutput[8] TIFF64 197x151 197x151+0+0 8-bit sRGB 0.010u 0:00.002
testoutput[9] TIFF64 31x24 31x24+0+0 8-bit Grayscale Gray 0.010u 0:00.001

I’m thinking of using the large-image-converter command line tool to do the conversion, but I had a lot of trouble installing the dependencies (especially GDAL). So I tried to use the docker image with the below command.

docker pull ghcr.io/girder/large_image:latest
docker run -v /path/to/images:/opt/images ghcr.io/girder/large_image:latest large-image-converter --help

But didn’t seem to work. Is it possible to provide me with the right commands to run the docker image for large-image conversion? I can’t seem to find any documentation around the usage of docker. Thanks again.

spatre · August 30, 2022, 9:08pm

I got this working. Please ignore the above comment. Thank you for all your suggestions @David_Manthey. They were very helpful!