Interview with Li Bing: The cross -modal video search he adhered to is outbreak

Author:Chinese network Time:2022.07.13

Li Bing said that through entrepreneurship, it is the most valuable thing that can really achieve the industry's landing with a group of like -minded partners to truly realize the implementation of the industry. With the outbreak of cross -modal video search, many blue ocean markets are waiting for the people's Chinese science to dig, and he is full of expectations and hope for the future.

Li Bing, chairman of the People's Central Sciences, likes to watch "Gump". He watched this film several times a year. The endurance and persistence of A -Gump have always attracted him.

In real life, Li Bing also experienced a "A -Gump" entrepreneurial. From learning to scientific research, to entrepreneurship, he has adhered to the field of video content and security for thirteen years.

The "Statistics Report of China's Internet Development Status" released by CNNIC showed that as of December 2021, among netizens, the usage rates of online video and short video users were 94.5%and 90.5%, respectively. 100 million.

Nearly 1 billion video users are giving birth to a new blue ocean market -cross -modal video search engine.

At the end of last year, the People's Daily Communication Council has released the cross -modal video search engine "Bai Ze", which is a cross -modal video search engine that is facing content security, which has caused widespread discussion in the industry.

Facing content security, "Bai Ze" combines the retrieval of multi -platform content at home and abroad, can implement text search pictures, text search videos, picture search videos, video search videos, picture search text, video search text and other functions across platforms.

In half a year, "Bai Ze" has been widely used in the fields of content risk control, strategic communication, digital government affairs.

The emergence of "Bai Ze" is just at the same time, and it also condenses the countless efforts of a technical team from the Chinese Academy of Sciences.

Not typical scientists

After the college entrance examination in 2000, Li Bing left his hometown in Anhui to study north. In the computer room of Beijing Jiaotong University, he first came into contact with the computer. "At that time, it was a very" grand "thing to wear shoes to wear shoes."

Prior to this, the world of Li Bing and computer technology was two parallel lines.

Li Bing was born in a rural family in a remote mountainous area. Speaking of his childhood fun, Li Bing laughed. When he was a child, he often helped his parents to put cattle and ducks. The duck was chased by his mother and ran a few miles. The mud room classrooms in the village have disappeared for a long time. After a heavy rain, it became a dangerous house. Therefore, the whole class was transferred to a nearby abandoned health center to take classes, which passed the elementary school full of "disinfection water".

Since then, walking out of his hometown through the college entrance examination has become Li Bing's firm belief. In 2000, when Li Bing's college entrance examination filled the volunteer, he had the first Internet cafe in his hometown county town, letting him know that there was the hottest computer major at that time.

After completing his Benshuo University of Tongshu University, Li Bing entered the Institute of Automation of the Chinese Academy of Sciences and began to do video content understanding and sensitive information identification research.

Institute of Automation of the Chinese Academy of Sciences is the largest institutional artificial intelligence research institution in China. The word "National Strategic Science and Technology Power of Intelligent Science and Technology of Intelligent Science and Technology in the New Era of Intelligent Science and Technology" was written on the huge publicity column of the Chinese Academy of Sciences Automation. There is also a dazzling pearl of the Automation Institute, which is the "Model recognition National Key Laboratory" established in 1984. As the first national key laboratory, it mainly studies the mechanism of human model recognition and effective computing models and algorithms.

This year is Li Bing's 13th year in the institute. He is a young researcher and blog director in the institute. During his research, he always likes to explore some cut -edge and practical topics.

In 2010, the Internet content was mainly based on graphic. The audio and video content was not as high as European and American family cameras due to the high threshold for production and creation. Therefore, most of them came from professional film and television production or introduction of overseas content. However, there are still some videos that include violent blood, terrorist activities, and instigating crimes have begun to appear. From then on, Li Bing also led the team to study terrorist video recognition and violent terror video analysis to provide technical support for regulatory authorities. Li Bing recalled, "At that time, in order to do research, I have read a lot of horror movies and violent bloody videos. Many scenes have been reluctant to think about it. Sometimes I think that if their children will see such content online, it will be difficult to accept it. I also feel that the burden on my shoulder is even heavier. "

Since then, with the continuous development and iteration of artificial intelligence, he has achieved the world's leading scientific research results in new areas such as multi -mode recognition, cross -modular understanding, and forgery of video recognition.

Li Bing described himself as an atypical scientist. He never wanted to be an academic research of "from thesis to the dissertation", but to let the research results solve practical problems and landed into the industry. Li Bing lamented, "We are lucky to be born in the high -speed period of the development and construction of the motherland in order to have such opportunities and platforms today. I also hope to use what we have learned throughout our lives and do something for the society and the motherland. What happened. "

Really entering a business is a story of "Maxima meets Bole".

In 2019, the People's Daily, which plans to lay out content and technology, is integrated with the Chinese Academy of Sciences, which wants to be transformed in scientific and technological achievements, and the world's leading video understanding technology has launched the journey of industrialization.

Entrepreneur

In 2020, the People's Chinese University of Science and Technology officially launched operations.

Such a legendary start -up company, with the genetic leader and the genes of the Chinese Academy of Sciences, take the cognitive ability of the next generation as the mission. With the attention of all parties, it sails. In November last year, the core product "Bai Ze" was officially launched. This is the first answer to the people's Central Sciences that was established in just two years.

The allusion of "Bai Ze" comes from a beast of "the feelings of all things and the shape of all things" in the Chinese mythology "Shan Hai Jing". This is a cross -modal video search engine that maps different modular information such as text, pictures, voice, and videos to a uniform feature space to represent space, with videos as the core, learning a unified distance measurement between multiple modes, crossing text , Voice, Video and other multimodal content semantic gaps, automatically associate key elements between multimodal state.

"Bai Ze" is undoubtedly a pioneer of cross -modal video search engine. But for Li Bing, this is a long -term accumulation of "cold bench".

In the Internet era, which is mainly based on graphic, video content is safe. Some people pay attention. On the one hand, there are not many video content, and the Internet scenarios have always been not as popular as security and industry. On the other hand Sensitive content has low attention in the academic community. Under the leadership of Researcher Hu Weiming, Li Bing and the research team took the lead in publishing the relevant research results of pornographic, horror images/videos, and published on top academic journals.

But Li Bing has been insisting. In his words, whether he is reading or working, he always hopes to be the best in a segment. After the team insisted on the field of video content for more than 20 years, he finally waited for the full outbreak of the track.

Cross -modal video search is outbreak

The real world is multi -mode, and information often exists in multiple modes such as text, sound, and images. The current artificial intelligence is developing rapidly, and there are major breakthroughs in the respective fields of natural language processing (NLP), automatic voice recognition (ASR) and computer vision (CV). There are obvious differences.

Human perception of the real world is multi -mode and cross -modal. In order to build an artificial intelligence that can "fully simulate human understanding models for real worlds", it needs to have the ability to identify and respond to multi -mode Mental neural network.

At present, with the help of deep learning technology, it can effectively express the characteristics of different modes of data, which can not only achieve the integration of different modular data, but also transform different modes of information (for example: text to images, videos, videos, videos, videos To text, etc.), so as to realize the intelligent understanding and representation of cross -modal.

Therefore, cross -modal understanding can be understood as a high -level stage of multimodal learning. The early period of multimodalism is expected to realize the integration of information between different modes, while the transmission is to further realize the unified expression of different modes, thereby achieving the mutual "translation" and "leap" of different modular information.

Li Bing believes that the human brain is extremely mysterious. For example, during the doctoral period, he studied the four constant nature of human visual cognition, namely constant colors, constant size, bright and constant shape. Taking the constant size as an example, in a photo, adults in the distance are shorter than children nearby, but when people see the photos, they know that adults are actually far higher than children. "Size" was performed in secondary processing.

Artificial intelligence can be said to be the crystallization of human wisdom, and it is the exploration and challenge of the mythical myself. This sense of excitement and accomplishment has always encouraged everyone. Li Bing led the team to publish related academic papers in top meetings and journals every year, and won a series of domestic and foreign competition awards. In 2020, under the leadership of the teacher Hu Weiming, he won the heavyweight award such as the National Natural Science Science Award. (Editor's note: The National Natural Science Award is established by the State Council of the People's Republic of China. The award is responsible for the National Science and Technology Awards Committee. It is one of the five national science and technology awards in China. Phenomenon, characteristics and laws, citizens who make major scientific discovery)

According to the CISCO VNI forecast, with the development of 8K videos, VR/AR applications and IoT in the future, global IP traffic will maintain index growth. In 2022, the IP traffic flowing through the global network will exceed the total traffic of all 32 years from the first year of the Internet to the end of 2016. Video, games and multimedia will account for more than 85%of the total traffic.

In the era of digitalization, with the rapid growth of non -structured content such as video, the interconnection of text, pictures, audio, and video is urgent. The content analysis strategy based on the content extraction and simple rules matching of a single modulation can no longer meet the actual needs.

To achieve cross -modular video search, many technical problems need to be overcome. The first is to conduct a large amount of data training, collect cross -modular data, and the second is to build a design neural network architecture, and finally let the entire model run. For enterprises, it is necessary to consider running in the lowest cost and most efficient way.

"In the past, scientific research was to bring dozens of people to do projects, and now they need to work with hundreds of people." Li Bing needs to make practical products that satisfy users and pay for users with low cost and less constraints.

For two and a half years, the revenue of the People's Central Sciences has maintained a growth rate of nearly ten times each year, from millions of revenue from million -level revenue to hundreds of millions of revenue.The People's Central Sciences is also insisting on high research and development investment to form a long and thick snow road.The transition from scientists to entrepreneurs must be both ability and feelings."There is no money in your eyes to earn a lot of money.""No matter what, you can stick to it for ten years, and it must be different." Li Bing said that through entrepreneurship, it is the most valuable thing he thinks to be able to achieve the industry's landing with a group of like -minded partners.With the outbreak of cross -modal video search, many blue ocean markets are waiting for the people's Chinese science to dig, and he is full of expectations and hope for the future.

- END -

A three -car rear -end accident occurred in Huizhou, Guangdong, causing 5 people to die

Xinhua News Agency, Guangzhou, July 21 (Reporter Ding Le) The Guangdong Provincial Emergency Management Department released the news on the 21st. At 10:36 on July 21, a three -car rear -tail traffic a

[Tomorrow weather: rainfall continues]

Affected by the short -wave slot and low -level cutting, it is expected that there...