Li Tian: Digital Human Method Theory Reflection

Author:Chinese school Time:2022.07.07

Summary: From the perspective of methodology, the difference between digital humanities and traditional literature research lies in the different research objects. The research objects of digital humanities shift from literary text to material text, and then bring about the research ideas and visual results of calculation as the core. Based on the objective object, digital humanities show the ability to excavate new problems through the excavation of relations. At the same time, due to the nature of calculation, the digital humanistic methods in literary research need to be used as an auxiliary means, and the study itself needs to return to the literary standard.

Keywords: digital humanistic material object relationship

Author Li Tian, ​​Associate Professor of Shenzhen Research Institute of Xiamen University (Shenzhen 518000).

Source: "Chinese Literature Critical" 2022, Issue 2 P175 — P181

Editor in charge: Ma Zheng 辑

For literary research, "digital humanities" are not a brand new concept. It is usually placed in the latest stage of humanistic computing development. It is generally believed that the methodology it given to literary research is "quantitative", that is, based on a series of data. Combined with the judgment of the researcher, studying literary works, concepts and phenomena. The early application was mainly concentrated in style analysis. For example, the 19th century Mendenhall tried to analyze the style differences between Dickens and William Sacley through frequent words. Chen Bingzao, a professor at the University of Wisconsin, Chen Dakang, East China Normal University, and Li Xianping University of Fudan University analyzed the author of "Dream of Red Mansions" with quantitative methods. At this stage, it is attributed to the stage of humanistic calculation. It is trying to confirm or falsify previous conclusions by explaining more objective theoretical basis than individual interpretation. At this time, the research is still artificial statistics, and the amount of data is not large. After the rise of big data, with the proposal of the concepts of "cultural groups", "long -distance reading" and "big analysis", digital humanities have really expanded in literary research.

As a new method of research, what is the significance of digital humanities to literary research? To answer this question, we need to know the difference between digital humanities and traditional literary research methods, and then explore what new discoveries it can bring to literary research. It is generally believed that digital humanities are pursuing a common goal: objectivity. The style analysis is to provide objective evidence for the author style or type style. Historical materials, inspect all the concepts in a machine reading, avoid the deviation of the survivors, and obtain a more objective basis.

This objective is expressed as "verifiable", which is opposed to the interpretivity of traditional research methods. The latter is often considered the essence of literary research. It is rooted in the reading of individual researchers for specific texts. Based on the researcher's own theoretical framework, we can draw corresponding conclusions, such as the interpretation of literary phenomena, and the reasons behind the phenomenon. Digging, etc.

However, with the deepening of research, researchers have gradually discovered that objective and rationality cannot cut the essence of digital humanistic methods. Not only is the data itself based on the selectivity of the researchers, but the statistical method is also blurred. More importantly, digital humanistic methods are. It contains interpretability. The so -called interpretability is actually a prerequisite for research: theoretical preset. If "one and inequality" is the theoretical assumption of building a concept of world literature, the theoretical preset of the small world network theory is to reconstruct the character relationship in "Hamlet", then the theoretical research on the title of 7,000 British novels, its theoretical nature Suppose is: the title can generate significance. "A novel is a narrative, and the title, especially as the novel summary, is also a short narrative. It presents the main events of the story, characters, environment and ending." Since almost all can produce new new new new can produce new new new new can produce new new new new can produce new new new new can produce new new new new can generate new new new new new can produce new new new new can produce new new new new can produce new new new new new can produce new new new new new can produce new new new new new can produce new new new new. The cornerstone of knowledge/point of view, then, where is the digital humanistic method "new" compared with traditional literary research methods?

1. Material object, computing and visual expression ‍

The new humanistic method is the three changes that it brings to literary research. The core is to be driven by calculation; the third, the changes in the expression of the research method can be visualized. Among them, the change of the research object is the cornerstone of the entire change chain. It is because literary text is transformed into a material object that the possibility of quantification and calculation, and visual expression is one of the most appropriate expression methods.

The reason why digital humanities and traditional literary research methods look so different are not the distinction between qualitative and quantitative, but because of the differences in the research objects. The research objects of digital humanities are not literary texts in the traditional sense, but as "texts" as objective objects. Literary text is a meaningful language symbol. If these symbols can be words, words, sentences, paragraphs, chapters, etc., then the meaning corresponding to the meaning of the language symbol is the meaning of the language symbol. There is a "appearance" relationship between the fingers and the capable finger. The work of the researcher is to pass through the ability, explain what it means, and then dig out the deeper significance hidden in the text, and explore the reasons it produces. And digital humanities are facing the objective object, the so -called material object, that is, a text that is used as a "thing" rather than a language symbol. In "things", there is no "appearance" relationship. The meaning of "things" points to itself. In style analysis, whether it is DocusCope or MFW, a certain amount of characteristic words will be selected as the statistical basis. They are not the meaning of these words, but the words of these words, the frequency of appearing, and the location of the emergence. In other words, these words are characterized by "things" themselves. In the analysis of social networks, "things" are the characters such as Hamlet, Claudie, Ophelia and other characters. Whether Hamlet is a prince, what kind of relationship he and Claudius are not concerned, no matter what the role is like The identity and status have become "things" and "things" expressed in nodes under the network analysis method. If the computer language is used as a metaphor, the traditional literary research method is similar to the "process -oriented process". It studies the discussion process within the literary works, and the digital humanistic method is a "object -oriented". Under digital humanistic methods, the form of "text" as a research object is manifested as language, printing or digital books, and a series of cultural or space around the work. They are more like the overall literary world. The material objects composed of different materials such as corpus, visual materials, rather than pure psychiatric content, that is, literary texts that express as emotional and meaningful, because of their materiality, only the possibility of dataization, calculation and visualization. In terms of application of mature style analysis, researchers split the texts of "Dream of Red Mansions" or "Shakespeare" into a measured corpus in order to use statistical methods to analyze or classify authors. It can be said that the material determines the characteristics of digital humanistic methods, such as auxiliary functions, the advantages of discovering macro problems and correlation problems, and visual expression. The material text can bring the expansion of the research object. In the perspective of digital humanities, the research object is no longer a simple literary text, but a multeded literary derivative world. From vertical perspective, it can cross the distance of literary works in different periods and realize the cultural history considerations in the sense of cultural groups, such as the concept of long -distance reading and large analysis in digital humanistic methods. From the horizontal perspective, it can span different types of art. It faces a literary world including renewal, adaptation, and derivatives, which can conduct literary research under multi -dimensional perspective. For example, "Dian Dian/Archives: Large -scale Dynamics in Literature" adopts the method of sociology to translate the number of novels in the 19th century Britain into French and German. Modern Language Association) The number of times mentioned in the reference database, and the length of the entry in DNB (The Biography of the Oxford National Character) is quantified into an "prestige" indicator. Through these two indicators, the British novels of the 18th and 19th centuries were drawn into the corresponding literary field map, and the research field of vision was expanded regardless of the horizontal or vertical.

As a material object, the text has become a collection of many measurement elements. On this basis, the calculation method can be implemented. Calculation is the core of digital humanistic methods. It runs through the construction of literary databases, specific literary research, and visualization of visualization. Express all fields, and determine the advantages and limitations of digital humanities. The calculations in digital humanities are mainly based on various types of computer technology and various statistical methods. For example, the theme model algorithm is through text mining, identify the theme information hidden in the large -scale corpus library, and extract the theme of each document in the library in the form of probability distribution. By analyzing the subjects extracted, the theme cluster or text is analyzed. Classification. The "One Leaf Story" text analysis system developed by Gu Zhen's Story Workshop adopts an emotional algorithm. It can draw a full -text emotional curve by refining key elements of text, the distribution and relationship of keywords.

Calculating drivers directly lead to changes in the expression of the research results. Language and text are no longer the only form of presentation. Visualization is the direct expression of the objective material of the object and the supplement to the language expression. In terms of visual presentation, visualization is not the patent of digital humanistic methods. Early charts and lists are all visualized, but unlike simple charts, calculated visualization has strong expression of multi -element relationships, such as using using it. The thermal map and the label cloud represent the system relationship of the concept group; use the network map to represent the internal structural relationship between the text; use the map to represent the social relationship network of the research object, etc., and realize the interaction through the visual platform. For example, the "Mapping Emotions in Victorian London" and "Song and Yuan Case Knowledge Map System" developed by Peking University, etc. And changes are unable to convey by individual researchers through words. It is not difficult to see that the visual expansion, algorithm and visualization of the material objects expressed in the research of the relationship between the multi -elemental relationship. The relationship under digital humanistic method can be a network relationship in a single text, the relationship between "popularity" and "prestige" in a long period of time, or the relationship between literary events in time and space dimensions. These relationships can span history and span the type of text, such as "cultural group learning" and "long -distance reading". The key is that the discovery of the relationship is based on machine reading, not a detailed reading from the researcher. In other words, these relationships are unknown to researchers before calculating intervention. Because of this, there is a new problem discovery and the generation of new knowledge. As the structured database is displayed, "the data can be reorganized arbitrarily to form new knowledge, and new problems can be discovered. Analysis; semantic retrieval and space -time positioning; and can be presented visually. " Researchers' operation of text data is to look forward to finding new things that individual reading cannot get from it. For example, Huo Lexiao's special "intermediary center" in "Hamlet" is found in "Hamlet": Huo Lexiao is not the central character of the text, but through calculations, he found that he occupies the center position in the relationship network. It is the only one. Each role in the story -whether it is the protagonist or the supporting role — a related character. Compared to the confirmation or proven of existing conclusions during the humanistic calculation phase, these new problems reflected in the excavation of the relationship may better reflect the exploration of digital humanities as a new method. We can examine the exploration of new issues from Chinese literature research in the past ten years.

2. Local practice of relational research

After the introduction of the concept of digital humanities, researchers in classical literature, modern literature and online literature have continuously explored the corresponding application areas. Statistical research, style analysis of literary texts, etc. The existing directions of the humanistic calculation stage, more importantly, the analysis of scholars pay attention to the history of literary thoughts and conceptual history of long -distance reading, and cross -disciplinary, cross -regional, cross -regional, cross -regional, cross -regional, cross -regional, cross -regional, cross -regional, cross -regional, cross -regional, cross -region, The comparison and comprehensive research of cross -ethnicity, cross -language, visual research on literary data. Gao Jianping believes that "the original construction of the theory of literary criticism in my country, the most important thing is the actual connection of the theory." Each step of digital humanities has developed simultaneously with computer technology. surroundings. The databases of various classical literature and modern literature are powerful backing of digital humanistic methods. Compared with the humanistic calculation phase, the data of the digital humanities stage is no longer limited to the document libraries such as the "Four Series" and the "Chinese Basic Ancient Books", but further developed into a structured database, such as "Tang and Song Literature Map of the Year of the Song Dynasty. Platform "and overseas" Chinese Character Biography Database "(CDBD). They all involve the application of digital medium and new technologies without exception, and they have strong materials, making them closer to digital humanities research objects, and also make us have new methodology soil.

In the past ten years of literature research, whether it is classical literature, modern literature, or online literature, you can see the emphasis on relational research more obviously. If long -term, complex literary and thought history, and conceptual history are vertical relationships, then comparison and comprehensive research on interdisciplinary, cross -regional, etc. are horizontal relationships. The time and space dimension presents a three -dimensional multi -dimensional perspective.

The vertical relationship mainly involves related research on literary history. As early as the 1990s, Jin Guantao and others used natural language processing technology based on the "professional database (1830-1930)" of Chinese modern thought history. During the time period, the changes in frequency are used to observe the history behind the evolution. Although this method called "History of Digital Concepts" is not a pure history of literature, it shows the possibility of digital humanities in the study of literary history from long -distance reading. For example, Ouyang Jian analyzed the changes in Wu Zetian in historical documents through the excavation of large -scale ancient book literature, and found some historical phenomena from it; "Research on the Influence of Song Ci in the Tang Dynasty: Centers for Six Poets" analyzed the Tang Dynasty The impact of the poetry of the middle Tang Dynasty on the Song Ci, discovered the twenty -first year of Zhenyuan (805) to the Yuanhe two years (807), Yuanhe Ten years to Yuan and Twelve years. The impact is the most significant. If the study of literary history and influence is vertical and time -like, and more focuses on macro observations, then social network analysis and other methods show another type of relationship -horizontal spatial relationship, showing that digital humanities are in Micro -level analysis ability. For example, the relationship between the characters in a single text, the researchers conducted a social network analysis of the "Zuo Zhuan" and found that although Confucius had a small number of times in the "Zuo Zhuan", he occupied a key position in the Spring and Autumn Period relationship network; "Big waves" analyzed social networks with complicated characters and discussed the true value of historical novel forms. In recent years, the integration with GIS technology and the development of various structured databases have provided a more convenient platform for the discovery of the relationship. For example, the "Tang and Song Literature Chronicle", combined with the time and space dioxide on the historical map, and shows the relationship between poets and poetry creation through visualization methods. Literary creation can also show all the creations and events of this area, allowing researchers to intuitively see the multiple relationships between the times, regions, poets and works.

Different from the research and practice of classical and modern literature, the research of online literature does not emphasize the vertical literary history background. Instead, it focuses more on horizontal relationships. Sexual research. The native digitalization of online literature, and the large -time data flow caused by its strong interaction, constitutes the initial digital form. Researchers believe that this is the unique opportunity of Chinese online literature research. As far as the text data generated during the same period of time, whether it is the volume of a single work or the number of works, online literature has far exceeded traditional literature. Many online novels are tens of millions of words, and their words have not been repeatedly quenched, rough and simple, and the number and quality characteristics of text make machine reading an effective auxiliary means. Researchers believe that the algorithm has been used as a way of thinking, leading the online evaluation and penetration of the creation, and forming a variety of "routines" or "mode". "" Retirement flow "and other bridge routines also have repeated application skills such as gold finger and climax, and even the opening and climax have accurate design. Unlike the type or model of traditional literature, there is a high degree of repeated or regularity in the routine of online novels, showing a strong quantitative and feasibility, that is, the algorithm gene that researchers think, which makes the integration of digital humanistic methods more convenient. Essence If researchers use the three aspects of setting, type and database, use the character comparison analysis tools to deal with some issues of the development of the "Ming Pai Wen" type, and seek the relationship between the technical and text generation mechanism; through the "one leaf story" The emotional algorithm not only reveals that the upgrade rhythm of the "upgrade text" is closely related to the emotions of the times, but also reveals the fact that researchers have never realized: the most popular "upgrade text" has only two rhythms at all; using machine learning technology Training model, digging the "face" routine in the text of "Resessing".

These "models" that can be extracted and quantified, which are essentially derived from UGC -oriented network literature creation mechanisms. Under the clicks -oriented production mechanism, quickly grasp the "cool point" of the reader's attention to becoming a necessary condition for creation. As a result, both the narrative structure and the shape of the characters show a single mode. As a result, online literature shows different industriality and sociality from traditional literature, and also allows researchers to expand research from business mechanisms and readers' communities. For example, combine Budier's theoretical and algorithm of the literature field, discuss the logic of the literary venue of the free network platform, and use the character network analysis method to examine the setting and structure of the "multi -treasure text". Marriage relationships, general disappointment of love myths, etc.

3. Calculation limit and literary research boundary

The base of the material object is the number, which is a binary code. The text can be quantified and entered the calculation. It is also the essence and characteristics of the calculation that determines the fields, advantages and limitations of digital humanistic methods.

Calculation means cover. Husserl believes that calculation is a path that constantly forgets itself, leaving a pure form through forgetting the original prototype. On the one hand, it gives research to confirmation and provides new propositions for research; on the other hand, literary research must eventually be attributed to interpretation, but the essence of forgetting the essence of calculation has brought difficulties to the interpretation of researchers.

Husserl regards the development of geometric to algebra as the beginning of "calculation", because these numbers should originally represent a certain shape, but "even though people are not as" mechanically 'calculated in the usual digital calculation, " People think, invented, and even make great discovery, but the meaning of the 'symbol' during this period has unknowingly changed. Later, this developed into a transformation of a completely conscious method. "

The thoughts, inventions and major discoveries that Husserl said, that is, the "relationship" under digital humanistic methods. Irre used the difference between scientific theory and literary theory with hard and soft. The hard theory lies in prediction, and the soft theory lies in the Mapping. In traditional research ideas, art works and literary works can be evaluated and cannot be predicted. However, the creativity of calculation lies in the predictability it provides, which can reveal the unable to find the relationship that cannot be found, and the problems that cannot be raised. This is a prediction caused by obstruction, and this prediction can bring the creativity. But while the covering brings new discoveries, it may generate greater limitations. Whether it is a variety of statistical methods such as the main component analysis (PCA), clustering analysis, or various calculation methods such as emotional algorithms, theme model algorithms, etc., they should have expressed a certain story, a certain trace, some historical, and certain symbols And real time and space, but after becoming data and visual images, they gradually retreat. Calculate the meaning of "causing time to take time. The actual concept of time and space is presented in geometry that it was originally manifested as‘ pure and intuitive things ’, and now it is transformed into a pure number structure and transformed into a algebra.” For digital humanistic methods, the limitations of calculation are to explain. Although the researchers do not expect the calculation to have an interpretation function, the problem is that the calculated forgotten essence cover the algorithm process, making it difficult to explain the data. Taking the statistical methods commonly used in literary research as an example, because language materials are the main sources of data and data processing methods benefit from more statistics, digital humanistic methods are also known as "mixed between statistics and social linguistics." Common data statistical methods mainly include the main component analysis, cluster analysis, judgment analysis, corresponding analysis, factors analysis, etc. Among them, the use of PCA in digital humanities is wide. The purpose of the statistical method is to "transform the prominent aspects of these texts into a number form, and then express those numbers into a visualized graph -it is a map, chart, and tree type borrowed from science to find literature and literature and New methods of social relations. "Maps, charts, and tree types are all models. The function is to form an objective explanation of the internal structure and relationship of the research objects through different representations.

However, the statistical method is similar to a black box. The researchers do not need to understand its operating process. The traditional demonstration process is wrapped and the analysis results are directly output from the output end. Taking the PCA method as an example, it combines many different words in a group of texts in a new ingredients to extract a group of main components that can reflect the main differences between this group of text. PCA provides a graphical method that can be used to "read" a large number of words at the same time, which is easier to explain it than the previous simple frequency list. This method reduces the many characteristics of the text to the two main features, but the computer only gives the distribution status of these two features, but it cannot explain what these two characteristics refer to. What the meaning is blurred, this undoubtedly challenged the researcher's interpretation work. Because calculation is a technical behavior, "truly giving this technical process and giving these normal results with the original thinking of truth was excluded."

It is based on this most fundamental limitations that the identity of digital humanistic methods in favor of using data on the one hand, and on the other hand, it also acknowledge that the new form of this knowledge will inevitably abstract or simplify the original complex literary phenomenon, so that literature makes literature The work loses its richness and uniqueness. David Brewer believes that although the long -distance reading method analyzes the "classic process accompanied by", it is at the expense of the different aspects of ignoring the different aspects of the literary works in history.

Corresponding to the limitations of the algorithm is the limitations of visual expression. As a commonly used expression of digital humanities, visualization is a direct way of the expression of algorithms. Especially for the research of relational types, the expression of visualization for relationships is more intuitive and convenient than traditional languages ​​and words. Text structure relationship and social relationship network. Compared with language, the advantage of visual image is intuitive and convenient. For example, in the knowledge map of the Song and Yuan dynasties, we can clearly see the evolution of different genres, but the current visualization is contrary to this. However, researchers who have not received statistical training not only cannot get an intuitive impression, but also need to combine a large number of interpretations to "solve the map", which can be marginalized.

Another important impact of algorithms on literary research is more prominent on the research of web articles. For online literature, the algorithm has opened the audience. Based on the materiality of the research objects, combined with the audience orientation brought by the cultural industry, the algorithm evaluation intervention in literature acceptance. As mentioned above, in the face of the new phenomenon of Chinese literature, the calculation has been effectively used, and it is inseparable from the cultural industry attributes of literary development. For example, the integration algorithms in the evaluation mechanism and the following influence rankings are ultimately to attract users more effectively to achieve some commercial purpose. Kobman discussed the influence of the media on literature in "Landscape" literature. He believed that under the expansion of the media, Attention and Visibilité had replaced the author to become the most precious thing. It has surpassed symbolic capital that originally played important roles in the field of literature. However, the commerciality of algorithm evaluation has challenged literature to some extent. Therefore, the quality of the research object has expanded the research field. What is the boundary? Literature is always the art of language. Literary text is always its core. Derivative forms such as film and television and fellows cannot replace the central position of the text. The audience's evaluation does not replace the professional research results of the professional. The border between the two is actually literary research and cultural research, especially the boundary with cultural industry research. The new literary type, new acceptance and evaluation mechanism may have the commerciality of the cultural industry, but there are still clear boundaries in literary research and cultural research, and cultural industry research. Foucault once described the relationship between language and literature in "What is Literature": "Literature is a distance from the inside of the language. It is a language that surrounds its own swing, a long -lasting vibration. "Literature is a silent language, a world that is essentially indispensable. It retains the ability of the origin and the ability to reveal the truth. Literary research can show new faces with the assistance of digital humanistic methods, but in the end it still has to return to its own literary standard.

- END -

"Fragments of Women": The germinated apple seeds grow into big trees

What the film makes people see is the independent thinking ability of contemporary...

Publishing more than 2700 Sichuan version of the Sichuan version of the 12th Jiangsu Book Fair published as the main guest group

Cover reporter Zhang Jie Suzhou Photography ReportThe 12th Jiangsu Book Fair, host...