A human view of computer vision at scale

Computers analysing and acting on what they see is not science fiction or even a new concept, it has been a reality of humankind's drive towards hyper-efficiency since around the time I was born.

Computer vision is not new

Computers analysing and acting on what they see is not science fiction or even a new concept, it has been a reality of humankind's drive towards hyper-efficiency since around the time I was born.

Machine learning tools

I'm not a data scientist, but by using tools such as Custom Vision provided by Microsoft Azure Cognitive Services, I was able to spin up a demo of a self-checkout in pretty much a day. The demo detected objects, basket behaviour and matched barcode/epos data to specific product detection to help inform when unscanned products enter the basket area.

If it's relatively straight forward to implement a proof-of-concept like the above, then why aren't multiple use cases rolled out across the billions of devices that are already out there?

A wealth of content, but analysis uptake is poor

There are perhaps a billion IP cameras dotted around the globe... and a few more billion phones with cameras. A relatively small percentage of these captured images are analysed, by interested parties such as social networks, video streaming services, manufacturers, governments and others.

Around 100 million photos are shared on Instagram each day, which would take a very dedicated human around a year to just look at (turns beer mat back over). Computers can do this much more quickly and efficiently, but like real-time video streams, deployment of computer vision at scale is scarce.

Deployment at scale

Computer vision comes at a high monetary and energy cost. Netflix claims to do computer vision at scale, but that appears to be limited to 2 million images for simple metadata extraction, using Haar classifiers for people detection and a text detection algorithm for, you guessed it, text.

If we compare a supermarket with 5000 shops, each with ten cameras running at 25 frames per second, in only 2 seconds the supermarket has generated significantly more than the 2 million images Netflix are calling computer vision at scale.

User and business value

Deploying a computer vision platform requires investment, a business needs to consider the value computer vision would bring and optimise for costs. Simply training models alone can present a high price before any analysis or prediction. In our experience, when developing models, the focus needs to be placed on bringing data and environments to data teams so that they can create and iterate on models.

Deploying and hosting at scale also presents a unique set of challenges, especially if deploying to the edge, which potentially increases costs further with additional hardware.

Peripherals on the periphery

One way to reduce hosting, analysis and prediction costs is to host and run models on edge devices (such as IP cameras), especially on existing hardware. I have delivered projects where people and package detection are done entirely on a doorbell camera to reduce cloud data transfer, storage and latency. However, these edge devices generally have limited capability, and careful attention to the quality of service is required to ensure there's no impact to core functionality. Detecting a burglar with a swag bag should not prevent that image from being recorded or notification of motion detection.

Pure compute to the edge, such as that provided by the Azure Stack Edge, aims to combat some of the prohibitive costs and latency to actuate, but can be cost-prohibitive in itself. It would cost a supermarket with 5000 shops a few million pounds per month just to rent the hardware. However, the hardware can be utilised for multiple use cases and become a platform for further innovation.


We are in a computer vision technology overhang. As engineers, we have the skills and ability to build transformative computer vision platforms, but the value to justify the means is yet to be unlocked. For expensive Smart Home IoT devices, it took the mass adoption of Smart Home speakers and voice assistants before they parachuted into the households of the mass market, and voice recognition technology itself was in overhang for decades. Does anyone remember Dragon Dictate? There are plenty of examples before that too.

Minimum viable computer vision

'Analysis paralysis' is a common trap for any machine learning project. Although rarely is any issue a technical constraint, visionaries still need to ensure the technical capabilities exist by tackling the highest technical risk aspects of a project with a proof-of-concept, whilst in tandem ensuring the value return is as anticipated. Value can be driven by involving the relevant stakeholders from day one, providing project transparency, conducting live market testing, getting things wrong and learning from them!

It's not surprising that automation is the key to enabling rapid feedback and reducing cycle times. Still, it's crucial to orchestrate data, tools, code, testing and environments to measures, analyse and promote success.

The fashion industry

During and since COVID-19 lockdown, there's been an apparent increase in e-commerce uptake and fashion window shopping on mobile apps. Could this be the time for computer vision and AI-powered shopping to take centre stage in fashion? e.g. browsing for clothes based on a photo, or what an item of clothing could look like on the user. The technology exists, but even when we were unable to visit the shops, the apps still haven't gained mainstream traction.

It seems the real value for fashion retailers is in unlocking multiple sales channels and expanding visibility and audience reach, proven by the impact of Instagram-as-a-Sales-Channel in the Chinese market. For fashion retailers, investing in additional sales channels delivers higher value at a lower cost than 'overengineering' niche tech in today's market.

Testing for COVID-19

Is preventing the spread of COVID-19 and the value of human life the driver to roll out computer vision at scale? Computer vision and artificial intelligence (AI) could aid research and potentially diagnose COVID-19, in addition to assessing long term damage and disease progression. There are now are FDA approved AI algorithms that can be iterated on and improved with the right data, environments and automation, ensuring security and privacy. I'd love to see this become a reality, with the appropriate data governance.

The autonomous vehicle pool

The first mass rollout of computer vision may have already begun, in the form of autonomous vehicles. With the UK government bringing forward it's ban yesterday on the sale of new petrol/diesel cars to 2030, the electric and autonomous vehicle rollout may accelerate even further. You may not appreciate how much computer vision technology there is in even a single Tesla - there are eight coupled cameras, 50 simultaneous tasks, and a sophisticated HydraNet architecture. Essentially the car collects data, Tesla labels with metadata then trains the system. It does have to run on a tiny computer, so the scope for expansion on the current hardware is limited, as it's already a complex engineering challenge. I'm sure the engineers at Tesla would have one neural network per image and task if they could.

What's next?

Can computer vision see the shape of tomorrow's world? AI could be the next generational change since the smartphone, and computer vision could play a leading role in that. However, with regulation, policy and privacy moving high up today's agenda, do we have a new challenge to computer vision at scale outside of cost? I hope so. I wouldn't want to compromise my data or privacy, and nor should anyone else. As a professional in the industry, I assess that the appropriate security and data processing rules are in place for a project, even if it's for a proof-of-concept. When these best practices are embedded in project delivery, backed by years of experience within consultancy known for its quality engineering, they mitigate issues further down the line and help everyone sleep at night! Requirements always evolve, and approaches may pivot, but secure and experience-based architecture should quickly adapt without redesign or compromise in a project set up for success.

Final comment

At San Digital, we help identify and solve real challenges, that drive efficiency and value, through high-quality service design, technology-agnostic engineering and digital transformation. We take great pride in ensuring we're building the right things for the outcomes we deliver.

Like all our engineers, I love to work with the latest technologies, but providing rapid, secure and measurable value to users and businesses is even more satisfying. Sometimes all these stars do align, so please feel free to speak to me about your niche and exciting technical challenges!

Jumping into the FHIR - type systems and objects

We have been doing a deep-dive on FHIR implementations and tooling following our initial FHIR investigation. A critical area of investigation for any system, particularly a large distributed system with many clients and peers that need longevity and guided evolution is its type system. Use of a strict type system can have many benefits - interoperability, governance, data quality, security, performance, developer experience, reduced component complexity and the ability to evolve services with confidence

Integrating with Events

The San Digital team has worked with numerous organisations in both the public and private sectors to transform their applications architecture into a flexible and business-focused model. Working with events at scale is key to maintaining individual teams' agility.

The process of building a mobile app

The team at San Digital has extensive experience developing apps for mobile devices, smartwatches, and smart TVs; using native and hybrid technologies (and everything in-between!) including using Rust for complex comms.

Low friction development environments

While setting up a sample project from an unnamed large vendor the other week I was disappointed by having to read large amounts of documentation and run various bits of script to install dependencies and set up infrastructure. We live in a world that has tools old (Make) and new (Docker) that can be combined to make onboarding engineers low or zero friction.

Cloud-native FHIR platforms

Continuing our series of posts on web protocols, we have been investigating more specialist protocols, in this case, "FHIR". We have produced a document based on our research, investigations and experience.

Team Structures

Multiple team structures can work to deliver software projects. There is no real one size fits all, however, there are common components that can be seen across different structures. At San Digital we believe that Engineer-led teams deliver great results for short duration high-impact projects.

Rules of the Road

This is called rules of the road but they aren't rules they're more guidelines, so they're rules until there is a good reason to ignore them.

Estimating and delivering defined outcomes

Recently there has been a shift away from time and materials projects towards defined outcomes, driven by various legislative changes, specifically IR35, but also cost control in the procurement function of larger organisations.

The San Digital Stack

San Digital has been designed as a remote first business from inception, on the assumption that it's easier to add offices later if they are necessary in an agile way. To work in collaborative way completely remotely takes a carefully thought out set of tools. Some of the ones that we use are really standard and some are a little more interesting.

Test driven design, or planning driven development

Design processes in most business software development resemble peer review or crowd-sourcing. A putative design is presented to peers, who will do their best to find problems while the originator of the design defends it against these challenges. Ideally, where they are demonstrated incorrect or incomplete the process will iterate and an updated design produced and defended.

A human view of computer vision at scale

Computers analysing and acting on what they see is not science fiction or even a new concept, it has been a reality of humankind's drive towards hyper-efficiency since around the time I was born.

Building scalable frontends

Scaling frontends is hard, actually scaling all codebases is hard, frontends just happen to be particularly visible and have a tighter feedback loop and a higher rate of change. As with all codebases, it is in principle possible to scale development through standards and integration processes, but these are a poor substitute for communication. Once development moves beyond the scope of a single team, either progress slows to take into account of different processes or implementations drift away from each other over time. Teams need to find a way to operate independently towards a goal.

Cross platform native mobile development with Rust

San Digital have extensive experience of mobile development and the use of Android as an embedded operating system. We treated android as a deployment target target for Rust firmware as well as writing our intricate real time communications component for both iOS and Android. This approach has advantages, you can maintain a single code base for a complicated communications layer, while also taking advantage of the full native capabilities of each platform

The evolution of web service protocols pt 2

At San Digital, some of us have been building things for people since the dawn of the web. Our historical perspective helps inform us about technological culture and trends today, almost compensating for the creaking knees.

The evolution of web service protocols pt 1

At San Digital, some of us have been building things for people since the dawn of the web. Our historical perspective helps inform us about technological culture and trends today, almost compensating for the creaking knees.