Main menu


What is computer vision?

How does computer vision work?

Computer image processing technology continues to evolve.

In recent years, machine learning has also been combined and used for various purposes.

" Computer vision " is one of the fields that are attracting attention among many research fields related to image processing .

It is used for video effects of dramas and movies in easy-to-understand places, and for image search on the Internet and face recognition functions of digital cameras in familiar places .

This article describes computer vision.

What is computer vision?

Computer Vision is an academic field that studies " realization of vision using computers . "

Excellent researchers in information engineering from all over the world are competing to carry out research .

Research purpose of computer vision

The purpose of computer vision is to give a computer a function equivalent to that of the human eye , that is, to realize " computer vision " .

Specifically, the goal is for computer software to have functions that are close to or better than human vision, based on still image or video data .

For example, the computer processes the input image from the camera and makes it function as the eyes of a robot or self-driving car.

How is it different from computer graphics?

Until now, computer graphics (abbreviation: CG ) have been the mainstream of computer image processing technology .

The differences between computer graphics and computer vision are as follows.

■ Computer graphics

Technology for projecting a 3D solid onto a 2D display

■ Computer vision

Technology to derive 3D information from 2D image data taken in the real world

Computer graphics and computer vision are contrasting technologies, but they can complement each other and contribute to new technologies .

Image display technologies such as virtual reality ( VR ) and augmented reality ( AR ) are examples.

Research on pattern recognition and machine learning is also underway in university laboratories .

The Graduate School of Systems and Information Engineering , University of Tsukuba has a Computer Vision Laboratory ( CVLAB ).

This laboratory belongs to the Center for Artificial Intelligence Science, and conducts research on human sensing , bioinformatics , robot vision, and theoretical research on pattern recognition and machine learning .

Professor Iizuka of this laboratory cooperated with the digital restoration of Monet's painting "Waterlilies and Reflections of Yanagi" as a joint research with Toppan Printing.

In this project, Professor Iizuka made AI learn coloring patterns based on Monet's other works, and estimated the overall color by referring to some of the colors in "Waterlilies and Reflections of Yanagi . "

Softbank establishes a company specializing in computer vision

In May 2019, SOFTBANK CORP . Established " Japan Computer Vision Corp.  of Hong Kong, which has the world's highest level of AI technology , as a partner .

Sensetime is a company with a reputation for face recognition technology using deep learning and computer vision technology .

Sensetime technology is used for 70% of Android devices with face recognition function.

Japan Computer Vision plans to provide solutions in the fields of smart buildings and smart retail by utilizing the image recognition technology of Sensetime .

Specifically, one is the development of solutions that enhance security and convenience, such as building entrance / exit management and streamlining reception and guidance for stores and commercial facilities .

In addition, we are considering implementing solutions such as recording products to be considered for purchase and in-store behavior, and presenting recommended products and discount information through smartphones based on the records .

Computer vision technology examples

Here, we will introduce specific examples of computer vision technology.

Digital camera face image recognition

The face recognition function used in digital cameras and smartphones is an example of computer vision technology.

Face recognition is a technology that uses machine learning to collate contours, eye, nose, mouth positions, etc. stored in a database with pattern recognition , and finds a human face from the target image .

This technology is used in digital cameras , such as the function that automatically focuses on a person's face when looking through the viewfinder, and the function that releases the shutter when a person smiles.

AR (Augmented Reality)

AR is an abbreviation of "Augmented Reality" and is a technology to superimpose 3DCG etc. on the scenery of the real world.

It refers to "augmented reality" where information is added by a computer .

AR is an example of a fusion of computer vision that recognizes the real world and computer graphics that depicts afictitious.

An easy-to-understand example is the smartphone game " Pokemon GO ," which has become popular in recent years.

With AR technology, you can feel as if the characters in the game have appeared in the real world.

In addition, AR technology is also used in furniture and home appliance installation simulation apps.

Development of technologies such as AR-equipped contact lenses that can be used like wearable terminals is also underway.

Autonomous driving of a car

The on-board camera plays the role of the eye in autonomous driving of automobiles .

Computer vision technology analyzes the images captured by the camera and uses them for autonomous driving .

For example, in the case of the " front collision prevention system ", a warning sound is emitted to the driver when a camera installed in front detects a nearby car or person.

Furthermore, when approaching a dangerous distance, the brakes are automatically applied to prevent collisions.

In addition, there is a " lane deviation warning system " that alerts you with a warning sound when the driver unintentionally out of the lane, and images taken by front, rear, left and right cameras when parking are combined like images from the sky. A " parking support system " that displays the information is being developed.

Surveillance camera system

Computer vision technology is also used in cases where a specific person is automatically detected from the video of a surveillance camera .

One example is TB-eye AI Solution , an image analysis system from TB-eye Co., Ltd.

This system combines deep learning with computer vision object recognition technology to achieve high - performance face recognition and image analysis .

Approximately 10,000 faces can be registered, and it is possible to set that an alarm sounds when a registered person or, conversely, an unregistered person is detected.

In addition, a system called " smart search " has been developed in which AI determines an object from an image by setting conditions such as "white car" and " person in black clothes" .

This system can significantly reduce research time.

Computer vision technology is also used in vein authentication , which is performed by holding the palm or finger over the terminal for the purpose of identifying a person .

Medical image processing

Research is also underway to find the affected area using image recognition technology from images taken inside the human body with CT or MRI .

A research team at the Swiss Federal Institute of Technology Lausanne has also developed a device that can produce 3D images of living cells in minutes .

This has made it possible to study drug effects at the level of a single cell.

Even in Japan, the development of diagnostic imaging support software using AI technology such as computer vision is progressing, and it is attracting the attention of medical personnel.

Application examples of computer vision technology

Application examples of computer vision technology

From here, we will introduce application examples of computer vision technology.

Match move

"Match move" is a 3DCG synthesis technology often used in movies and dramas.

The actor's performance is shot in a blue sheet studio, and the background CG to be synthesized later is a method of calculating and creating based on the camera 's viewpoint change and 3D information , and matching it with the actor's image .

It is similar to AR in that it synthesizes another video or 3DCG with the landscape video, but while AR is a real-time process , match move is a video editing technology that synthesizes the video later .

projection mapping

"Projection mapping" is a technology that projects images onto three-dimensional objects such as buildings , which has recently come to be seen at various events .

In the case of an ordinary projector used in an office or movie theater, the image is projected vertically on a flat screen.

The screen is also usually a rectangle.

On the other hand, in the case of projection mapping, the 3D information of the building on which the image is projected is read and projected according to the shape (mapped) .

It is possible to project only inside the specified contour, and to change the projected image at each change of surface.

You can adjust it so that it does not look unnatural when viewed from other than the front.

This mapping work is basically done with video editing software.

By calibrating (adjusting) the positional relationship in 3D between the projector and the camera , it is possible to map images in 3D space with high accuracy .

In addition, this technology has evolved further, and interactive projection mapping has also been created in which the image projected on a room or aisle changes in response to the movement of people in the space .

Gesture recognition

In recent years , gesture recognition technology , which allows you to operate a TV or camera without touching it by just gesturing, has been attracting attention as a user interface.

The " Kinect " sold as a controller for Microsoft's game console "Xbox 360" is a product that uses this technology .

It is equipped with a camera and sensor, and can be operated with gestures and voice without touching anything.

In addition, a 3D shape measurement sensor is installed.

Computer vision technology is also utilized in these technologies.

Due to its low price and high performance, it became a worldwide hit product .

Since then, developer packages that can be used to create commercial applications other than games have been offered at low prices and have been applied in various fields .

One of them, in the medical field , is a system that allows doctors during surgery to operate a PC by recognizing non-contact gestures .

By using this system, you can check X-rays without touching the PC screen during surgery or instructing other staff to operate the PC, so you can improve efficiency while considering hygiene. 

Computer vision helps human vision and broadens horizons

It is said that more than 80% of the information that humans acquire from the outside comes from the visual sense.

If computer vision enables computers to have the functions of the human eye and beyond, it will lead to a more convenient and safe life .

Computer vision technology is indispensable not only in the field of gorgeous entertainment, but also in next-generation technologies that have been attracting attention in recent years, such as autonomous driving of automobiles and the development of robots .

There is no doubt that it will become more and more important in the future as a technology that expands the possibilities of various fields .

If you are an engineer who wants to broaden your horizons and contribute to the development of new technologies that are useful to the world, please expand your knowledge with recommended books.


The purpose of computer vision is to realize "human-like vision" on a computer.
Techniques similar to but contrasting with computer graphics
Computer vision technology is used not only for entertainment, but also for medical care and autonomous driving of cars.