Big data expansion of social quantitative research methods

Author:China Social Sciences Network Time:2022.06.14

For a long time, the quantitative research of sociology relies on survey, census and experimental data, and analyzes and explained social phenomena using statistical models based on statistical regression analysis. With the development of big data and its analytical technology, big data such as digital text, media social networks, and space -time information have been widely used in the research fields of sociology. At the method level, the research method system of sociological quantitative quantification has changed.

Study paradigm steering theory and data dual -drive

The study of traditional sociological quantitative research is an empirical study under theoretical guidance. In response to specific research issues, the researchers conducted theoretical deductions based on relevant theoretical and past research conclusions, proposed research assumptions, and then used scientific statistical models to analyze data to prove or fake assumptions. In this process, the proposal, concept measurement, and variable selection of research assumptions are mainly based on past theory and conclusions. The dependence on theory will make researchers often in the experience of predecessors. As a result, most quantitative research can only perform addition and subtraction variables, increase interaction items, and group groups in the experience of predecessors. Small tonic, it is difficult to achieve breakthrough theory. Due to the high dimensions of big data, and the limited thinking dimension of the human brain, it is difficult to develop imagination in high -dimensional space. Therefore, big data analysis usually uses data -driven methods to enter all possible variables (features) and enters all machine learning models. By calculated on the real and complicated relationship between the variables and cases in the data, find it between the groups, and find the group between the groups. The structural relationship between variables helps researchers find knowledge and laws from the real data of massive and high -dimensional. At present, not only big data is applied to sociological research, but its data -driven thought has also been integrated into various aspects such as variable selection, structural analysis, heterogeneous group recognition, and causal inference. New quantitative research paradigm of theoretical dual -drive.

Its theoretical driver is that researchers need to guide variable selection scope, formulate analysis strategies, analyze model results according to the professional knowledge of related fields, and conduct theoretical construction, and put forward new social theories from the research on experience phenomena. Its data driver is mainly reflected in specific research methods: (1) In the selection of variables, using Lasso regression, Ridge return, etc. There are monitoring machine learning models. The important influence variables of variables, "extensively spreading the Internet and focusing on fishing", so as to avoid omission of variables, and may also be discovered from the perspective that the past theory has not touched, thereby promoting theoretical innovation; Analysis, T distribution-random adjacent to embedded (T-SNE) and other machine learning embedded technologies, mapped data from high-dimensional space to low-dimensional space to "concentrate" data, which can not only make the original sparse, discrete high-dimensional data Converting into a continuous variable incorporate statistical model can also help researchers to discover the hidden structure between variables; (3) In terms of heterogeneous group recognition, the useless cluster model is used, which is based on the characteristics of the sample. You can also consider the high -vitami signs at the same time to obtain groups with homogeneity and heterogeneity between the group, which helps researchers to further explore the heterogeneous laws in different groups; Cause and Effect Forest and other machine learning models that analyze heterogeneous treatment effects can automatically estimate the effect of processing variables on the heterogeneous treatment of different groups; use Causal Bayesian Network based on causal discovery algorithms, not only It can infer the cause and effect between the relationship between the independent variables and the dependent variables, the incomparable estimation of the causal effect, and the causal relationship between the variables.

Method theory Steel to individuals and overallism coexist

Most of the data sources of traditional sociological quantitative research are micro -survey data at the individual level, and most of them use linear regression and other methods to discover the relationship between variables. Therefore, the theoretical issues discussed by most quantitative institutes are mainly individual levels, such as how education and relational networks affect the status of individuals, and how social capital affects the health of individuals. Although some research uses the macro -level indicators such as communities, regions, and cities, the ending point of its discussion is how these macro -level factors affect micro individuals, such as how the market -oriented level of the region affects individual educational returns. Although the scientific sampling survey and the data obtained by the survey are generally representative, the relationship between its variables can be promoted to the general, but the social theory proposed by the Sociology Quantitative Research Institute with micro individuals as the object is mainly concentrated in micro and micro and micro -and At the level of view, it is difficult to construct macro social theory.

The development of big data and the development of its analysis methods has bred the soil based on the development of macro -quantitative sociology research based on overallism, and transforms social scientific quantitative research from individualism to individual and overallism. On the one hand, big data can provide different levels of summary data, link multiple data sources, can fill the macro -level data blank, and provide data support for macro -level research. For example, using registered big data can study social changes and understand the trend of population changes. , Analyze the laws of economic development, explore the impact of policies, etc. On the other hand, big data analysis methods, especially the development of complex network analysis technology, also provides the possibilities for researchers to find the characteristics and laws of macro -level characteristics from a large number of micro data. Although the complex network is constructed based on the relationship between individuals, the focus of its analysis is the characteristics, changes and generation mechanisms of the overall structure of the network. Therefore, complex network analysis mainly focuses on the overall and macro level, such as exploring how social groups form and differentiate from the structural changes of dynamic social networks; explore the changes and reasons for the structure of labor market structures from professional mobile networks; from population migration from population migration The network of population flow is found in the network and its laws; the theme of the dissertation, quotation, and the development trend of science in the cooperation network. Analysis method to turn to a diversified method system

The theoretical -driven paradigm, individualist methodology, and data limitations are restricted. In the past, the study of quantitative quantitativeization of sociological science is usually based on variables, and the regression model is used to analyze the correlation of variables. Facing big data with large volume, high dimensions, diversified forms, fast growth rate, and low value density, researchers need to use different methods to combine information to meet the needs of research needs, thereby conducting knowledge production and scientific discoveries. First of all, because large data contains a large amount of non -structured data, researchers should attach great importance to descriptive analysis, extract key information through data clearance and design proper statistical indicators, and then cleverly use visual technology Methods, maximize the information contained in the data. Secondly, the forms of big data are diverse, such as text data, audio and video data, image data, network data, etc. Researchers need to master the corresponding text analysis technology to cope with diverse data forms. Finally, big data is not equal to full sample data. On the contrary, most of the big data that researchers can get are selective samples obtained from specific groups. Therefore, compared with random sampling survey data, causal inference based on big data has brought greater challenges to researchers. Researchers not only need to master more scientific causal inference tools, but also need more thorough research and design in order to avoid the big mistakes brought by big data, and identify the true causal relationship from complex appearances.

In the era of big data, the expansion of data has injected new vitality into sociological research. The development of big data analysis technology also provides possibilities for the innovation of social quantitative research paradigms and its methodology, but new opportunities also mean new challenges. Massive and complicated big data put forward higher requirements for the "computing power" of computers and researchers, and today when the big data acquisition and use mechanism is not yet complete, the big data that meets the research needs has also become a social science research. A major challenge. Therefore, promoting the construction of big data platforms, improving the innovation of big data data production and open mechanisms and research methods has become an important issue that quantitative sociology researchers need to promote.

(Author unit: Department of Sociology and Social Work of Sun Yat -sen University)

Source: China Social Science Network-Journal of Social Sciences of China

Author: Liang Yucheng Jia Xiaoshuang

- END -

strangeness!Pluto’s atmosphere is actually disappearing, scientists are hurrying to study

How to confirm that Pluto has the atmosphere, how to judge the expansion of the at...

The vast universe, the second gate of the south gate is our second home?The answer may be like this

We interviewed the execution producer Zach Estron, and he told us everything.Illus...