A human view of computer vision at scale

Computers analysing and acting on what they see is not science fiction or even a new concept, it has been a reality of humankind’s drive towards hyper-efficiency since around the time I was born.

Computer vision is not new

Computers analysing and acting on what they see is not science fiction or even a new concept, it has been a reality of humankind’s drive towards hyper-efficiency since around the time I was born.

Machine learning tools

I’m not a data scientist, but by using tools such as Custom Vision provided by Microsoft Azure Cognitive Services, I was able to spin up a demo of a self-checkout in pretty much a day. The demo detected objects, basket behaviour and matched barcode/epos data to specific product detection to help inform when unscanned products enter the basket area.

If it’s relatively straight forward to implement a proof-of-concept like the above, then why aren’t multiple use cases rolled out across the billions of devices that are already out there?

A wealth of content, but analysis uptake is poor

There are perhaps a billion IP cameras dotted around the globe… and a few more billion phones with cameras. A relatively small percentage of these captured images are analysed, by interested parties such as social networks, video streaming services, manufacturers, governments and others.

Around 100 million photos are shared on Instagram each day, which would take a very dedicated human around a year to just look at (turns beer mat back over). Computers can do this much more quickly and efficiently, but like real-time video streams, deployment of computer vision at scale is scarce.

Deployment at scale

Computer vision comes at a high monetary and energy cost. Netflix claims to do computer vision at scale, but that appears to be limited to 2 million images for simple metadata extraction, using Haar classifiers for people detection and a text detection algorithm for, you guessed it, text.

If we compare a supermarket with 5000 shops, each with ten cameras running at 25 frames per second, in only 2 seconds the supermarket has generated significantly more than the 2 million images Netflix are calling computer vision at scale.

User and business value

Deploying a computer vision platform requires investment, a business needs to consider the value computer vision would bring and optimise for costs. Simply training models alone can present a high price before any analysis or prediction. In our experience, when developing models, the focus needs to be placed on bringing data and environments to data teams so that they can create and iterate on models.

Deploying and hosting at scale also presents a unique set of challenges, especially if deploying to the edge, which potentially increases costs further with additional hardware.

Peripherals on the periphery

One way to reduce hosting, analysis and prediction costs is to host and run models on edge devices (such as IP cameras), especially on existing hardware. I have delivered projects where people and package detection are done entirely on a doorbell camera to reduce cloud data transfer, storage and latency. However, these edge devices generally have limited capability, and careful attention to the quality of service is required to ensure there’s no impact to core functionality. Detecting a burglar with a swag bag should not prevent that image from being recorded or notification of motion detection.

Pure compute to the edge, such as that provided by the Azure Stack Edge, aims to combat some of the prohibitive costs and latency to actuate, but can be cost-prohibitive in itself. It would cost a supermarket with 5000 shops a few million pounds per month just to rent the hardware. However, the hardware can be utilised for multiple use cases and become a platform for further innovation.

Overhang

We are in a computer vision technology overhang. As engineers, we have the skills and ability to build transformative computer vision platforms, but the value to justify the means is yet to be unlocked. For expensive Smart Home IoT devices, it took the mass adoption of Smart Home speakers and voice assistants before they parachuted into the households of the mass market, and voice recognition technology itself was in overhang for decades. Does anyone remember Dragon Dictate? There are plenty of examples before that too.

Minimum viable computer vision

‘Analysis paralysis’ is a common trap for any machine learning project. Although rarely is any issue a technical constraint, visionaries still need to ensure the technical capabilities exist by tackling the highest technical risk aspects of a project with a proof-of-concept, whilst in tandem ensuring the value return is as anticipated. Value can be driven by involving the relevant stakeholders from day one, providing project transparency, conducting live market testing, getting things wrong and learning from them!

It’s not surprising that automation is the key to enabling rapid feedback and reducing cycle times. Still, it’s crucial to orchestrate data, tools, code, testing and environments to measures, analyse and promote success.

The fashion industry

During and since COVID-19 lockdown, there’s been an apparent increase in e-commerce uptake and fashion window shopping on mobile apps. Could this be the time for computer vision and AI-powered shopping to take centre stage in fashion? e.g. browsing for clothes based on a photo, or what an item of clothing could look like on the user. The technology exists, but even when we were unable to visit the shops, the apps still haven’t gained mainstream traction.

It seems the real value for fashion retailers is in unlocking multiple sales channels and expanding visibility and audience reach, proven by the impact of Instagram-as-a-Sales-Channel in the Chinese market. For fashion retailers, investing in additional sales channels delivers higher value at a lower cost than ‘overengineering’ niche tech in today’s market.

Testing for COVID-19

Is preventing the spread of COVID-19 and the value of human life the driver to roll out computer vision at scale? Computer vision and artificial intelligence (AI) could aid research and potentially diagnose COVID-19, in addition to assessing long term damage and disease progression. There are now are FDA approved AI algorithms that can be iterated on and improved with the right data, environments and automation, ensuring security and privacy. I’d love to see this become a reality, with the appropriate data governance.

The autonomous vehicle pool

The first mass rollout of computer vision may have already begun, in the form of autonomous vehicles. With the UK government bringing forward it’s ban yesterday on the sale of new petrol/diesel cars to 2030, the electric and autonomous vehicle rollout may accelerate even further. You may not appreciate how much computer vision technology there is in even a single Tesla - there are eight coupled cameras, 50 simultaneous tasks, and a sophisticated HydraNet architecture. Essentially the car collects data, Tesla labels with metadata then trains the system. It does have to run on a tiny computer, so the scope for expansion on the current hardware is limited, as it’s already a complex engineering challenge. I’m sure the engineers at Tesla would have one neural network per image and task if they could.

What’s next?

Can computer vision see the shape of tomorrow’s world? AI could be the next generational change since the smartphone, and computer vision could play a leading role in that. However, with regulation, policy and privacy moving high up today’s agenda, do we have a new challenge to computer vision at scale outside of cost? I hope so. I wouldn’t want to compromise my data or privacy, and nor should anyone else. As a professional in the industry, I assess that the appropriate security and data processing rules are in place for a project, even if it’s for a proof-of-concept. When these best practices are embedded in project delivery, backed by years of experience within consultancy known for its quality engineering, they mitigate issues further down the line and help everyone sleep at night! Requirements always evolve, and approaches may pivot, but secure and experience-based architecture should quickly adapt without redesign or compromise in a project set up for success.

Final comment

At San Digital, we help identify and solve real challenges, that drive efficiency and value, through high-quality service design, technology-agnostic engineering and digital transformation. We take great pride in ensuring we’re building the right things for the outcomes we deliver.

Like all our engineers, I love to work with the latest technologies, but providing rapid, secure and measurable value to users and businesses is even more satisfying. Sometimes all these stars do align, so please feel free to speak to me about your niche and exciting technical challenges!

Tags:

Get in touch