The exploration and application of the database and knowledge map in the micro -wealth risk control system

Author:Data School Thu Time:2022.09.29

Source: AI frontline

This article is about 3500 words, it is recommended to read for 7 minutes

This article introduces the exploration and application of the diagram database as a powerful tool for the analysis of the complex relationship network.

In recent years, with the continuous improvement of supervision, the continuous development of financial institutions' business, and the more convenient context of trading methods. The relationship between customers, accounting, funds, and funds are becoming more complicated, and the black production is more hidden, and the requirements for internal risk control are also continuously strengthened. The effect of traditional relational databases on this complex relationship network is becoming more and more limited, and it is difficult to return the results in a reasonable time in a multi -dimensional query. As a powerful tool for complex relations network analysis, how to efficiently use its ability to play high -performance, high expansion, and high stability, which is very important.

1. The current status and existing problems of the database and knowledge map

The diagram data is closer to the relationship in natural society, and it solves the query performance of the complex relationship network. It can find the hidden relationship quickly and make up for the lack of analysis methods.

Use the status quo in the field of risk control

As the core of the knowledge map storage and display, there are many options for commercialization and open source communities in database databases. Such as: Alibaba Cloud's GDB, Tencent's KONISGRAPH, Nebula Graph, Neo4J, Janusgraph, etc.; With the help of the graph database, it provides strong support for the accumulation of domain knowledge, business cost reduction, and risk prediction. As a wide range of database applications, the knowledge map plays an extremely important role in the current industry's risk control fields in the field of risk control. in particular:

1. Before loan:

Fraud gang excavation: gang mining and automatic rule mining based on expert experience.

Early warning of risk events: Based on the impact of new customers on risk control scenarios, triggering risk events can achieve early warning results.

2. Loan:

Transaction transfer: Real -time tracking of capital flow and online prediction of transaction information.

Risk tracking: Real -time tracking abnormal indicators, scanning customer risks, and realizing risks early detection and early blocking.

3. After loan:

Analysis of money laundering fraud: Quickly identify suspicious transactions based on multi -transactions.

Lost Repair: Provide intermediate contacts for customers who have lost contact.

In the current situation of micro -wealth use

With the continuous development of good installment financial product business, the data related data related data is increasing. These data were originally only stored as 部 external information storage, and the method of ⽆ to form effective knowledge, let alone build a knowledge map to provide reasoning and prediction for the company. To this end, the basic data of the existing credit business, historical transaction data and the danger data of the three crickets, etc., so that the relationship between the composition of the diagram database is The characteristics of the households, the analysis of the puppets, and the hidden gang based on the existing list of the existing list, have become a key tool in anti -fraud. There are mainly the following application scenarios:

1. Provide fixation of lost information after loan: Support call records, nearby addresses, and IP device numbers and other dimensions.

2. Visualization and user portrait: Provide sub -diagram display, key path prompt, user portrait display function.

3. Titter's support business: Provide various clustering features according to the calculation of the figure, for models and strategies.

4. Dragon gang excavation: According to the existing blacklist fraud, the excavation of fraud gangs is achieved based on the multi -dimensional correlation cluster algorithm.

2. Some problems encountered in the process of micro -wealth practice

How to prepare and enter the database in the early stage to achieve cold startup

For the construction of the database, the introduction of offline basic data is the prerequisite. In Hive, we have about 4T data that needs to be imported. Such a large amount of data is difficult to introduce the format data needed. difficulty.

solution:

For massive data preparation and guidance, Janusgraph provides Bulk loading method, but Hadoop is based on three formats that support three import data:

GryoinputFormat/GraphsonInInPutFormat/ScriptinputFormat. We choose Graphson format. This data format is similar to JSON, which is convenient for understanding and conversion, but there are certain differences. To this end, we customize a data format Fatgraphson, so that after the data is extracted from the Hive Table, it is convenient for the MapReduce task to read the data and prepare the Graphson format data.

Flatgraphson is a data format between Graphson and Hive Table. It is the result format of various types of data that HIVE processing, and it is also used to generate data formats that generate Graphson format data.

The specific definition is as follows:

The format of the edge is: 'Edge' # from_vertex_value # to_VERTEX_VALUE # [Property_name: Property_value]

顶点的格式为: 'VERTEX' # vertex_value [ # property_name : property_value [: meta_property_name : meta_property_value] ]Hive 生成数据后,通过 MapReduce 任务读取对应 HDFS 文件处理生成的 GraphSON 格式数据,最后使用 bulk loading 方式导入。 One of the questions was that the official Bulk Loading could not be submitted to Spark Yarn at the time, and the source code needs to be transformed to improve performance.

How to switch smoothly

The risk control system is in the core link of the overall business, and the requirements for stability and continuous service availability are high

Online services cannot be suspended, service requests cannot be lost;

It is necessary to use complete graph data to provide risk calculation results (generate risk features, etc.);

When encountering a new data source, modifying the diagram database SCHEMA, or a new map, how can we ensure the continuous availability of the service when upgrading and restarting operations?

solution:

1. Establish two completely consistent pictures. You can quickly implement it through the clone_snapshot copy table of HBASE. The two sets of the same services are deployed and two libraries are read separately. The two systems can be consumed through different consumer-found consumption messages

2. Set the logo bits in the database, such as 0 and 1, there is only one system external service at a time, and another system is used as a backup library

3. When you need to upgrade the graph, through the logo control, for example, 1 is now in a backup state, 0 is on the line, we can upgrade the 1 system first. After the upgrade is over, replace the logo bit to 0, let the upgraded system external service, and then upgrade the 0 system.

Super node processing

The inevitable problem in the database is Super Vertex, which brings problems:

The degree of the vertex obediently obeys the distribution of power law (such as the celebrities in the address book, or Baidu address, etc.) The general diagram of super vertex

When the data is cleaned under the MapReduce framework, a single key corresponds to many value will cause Reducer OOM

HBase (Janusgraph storage end) Column has a lot to lead to a sharp decline in performance

Too many nodes in the picture will cause the query explosion

solution:

1. For the entity that has no practical significance

(For example, device number: 00000000-0000-0000-000000000000, 00000000) Directly filter out

2. Do not filter it out, and the equivalent is converted to a vertex attribute set (becoming the edge of the attribute). When using it, it can be used directly to filter the attribute, which reduces the original 5 billion edge to more than 1 billion.

3. In addition, the excessive query node of the GEO range is caused by the rapid decline in performance. You can use LIMIT to cut off the actual business to meet the performance requirements

Figure data visual display

How to intuitively see the connection between entities, facilitating strategies and model students to observe the discovery characteristics, visualization is the key. Cytosacpe.js is the JavaScript version corresponding to the desktop version of CYTOSACPE. For the desktop version, we have used it in the past and recent analysis. For the JavaScript version, it has also used Demo in previous projects. It is more convenient from the experience of the use experience. Yes, the picture made is also cool. It can be perfectly displayed in the relationship of the relationship network and make some customization.

Achievement: a path containing a blacklist user

Third, the effect after launch

Titter's support business

Case 1: Number of failed users within 80 meters of GPS

For this kind of scene, we need to get the GPS information collected at the latest time when the user placed the order in the database, and obtain the corresponding users within the scope through the GPS latitude and longitude information (this step needs to be matched with MySQL spatial index or rely on Redis Geohash to come to the step. Implementation), then use the user to enter the table to find the corresponding input to screen the failed entry users. Solution: You need to use the Geoshape type in Janusgraph. Janusgraph defaults to Elasticsearch as a hybrid storage engine. It can easily and quickly obtain Vertex in the GPS radius. Users, the number of users who have failed to review the lending review, count the number.

The specific code is as follows:

- END -

Shandong Agricultural University experts send organic fertilizer technology to the field of fruit trees in the field to "eat" "set" fruit farmers drumming up

On the morning of September 23, Shi Lianhui, an associate professor of Shandong Ag...

Intimately serving "new citizens"

Intimately serving new citizensSun Guijin Xu GangSince the beginning of this year,...