Why citation analysis called secondary analysis




















How citation distortions create unfounded authority: Analysis of a citation network. British Medical Journal. Hammarfelt, B. Haunschild, R. Relationship between field-normalized indicators calculated with different approaches of field-categorization. Wouters Ed. Leiden: University of Leiden. Algorithmically generated subject categories based on citation relations: An empirical micro study using papers on overall water splitting. Journal of Informetrics, 12 2 , — Hug, S.

The coverage of Microsoft Academic: Analyzing the publication output of a university. Citation analysis with microsoft academic. Scientometrics, 1 , — Hyland, K. Talking to the academy: Forms of hedging in science research articles.

Written Communication, 13 2 , — Jann, B. Tabulation of multiple response. The Stata Journal, 5 1 , 92— Judge, T. What causes a management article to be cited: Article, author, or journal? Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. Kuhn, T. The structure of scientific revolutions. Lamers, W. Patterns in citation context: the case of the field of scientometrics. Merton, R. The sociology of science: Theoretical and empirical investigations.

National Research Council. How people learn: Brain, mind, experience, and school. Petrovich, E. Accumulation of knowledge in para-scientific areas: the case of analytic philosophy. Scientometrics, 2 , — Popper, K. Logik der Forschung: Zur Erkenntnistheorie der modernen Naturwissenschaft. Vienna: Springer. The logic of scientific discovery 2nd ed.

London: Routledge. Conjectures and refutations: The growth of scientific knowledge. Ann Arbor: University of Michigan. Unended quest: An intellectual autobiography. Bibliometric evaluation of SEPA-funded large research programs — Stockholm: Swedish Environmental Protection Agency.

Scheidsteger, T. The concordance of field-normalized scores based on Web of Science and Microsoft Academic data: A case study in computer sciences. Sieweke, J. Pierre Bourdieu in management and organization studies-A citation context analysis and discussion of contributions. Scandinavian Journal of Management, 30 4 , — Sinha, A. An overview of microsoft academic service MAS and applications.

Small, H. Characterizing highly cited method and non-method papers using citation contexts: The role of uncertainty. Citations and certainty: A new interpretation of citation counts.

Discovering discoveries: Identifying biomedical discoveries using citation contexts. Journal of Informetrics, 11 1 , 46— Cited documents as concept symbols. Social Studies of Science, 8 3 , — Solomona, G.

What people learn about how people learn: An analysis of citation behavior and the multidisciplinary flow of knowledge. Research Policy , 48 9. Tahamtan, I. Core elements in the process of citing publications: Conceptual overview of the literature. An updated review of studies on citations in scientific documents published between and Scientometrics , 3 , — Teplitskiy, M. Why almost everything we know about citations is wrong: Evidence from authors.

Leiden, the Netherlands: University of Leiden. Wray, K. Cambridge: Cambridge University Press. Rosenberg Eds. Yates, D. The practice of statistics. New York, NY: W. Zuckerman, H. The sociology of science and the garfield effect: Happy accidents, unanticipated developments and unexploited potentials. Frontiers in Research Metrics and Analytics, 3 20 , 2. Download references. Open access funding provided by Max Planck Society.

The bibliometric data used in this paper are from a locally maintained database at the Max Planck Institute for Solid State Research derived from the Microsoft Academic database.

We would like to thank Henry Small for discussing the use of hedging words for measuring uncertainty. He also supports our study by providing us with his initial hedging words for measuring uncertainty. You can also search for this author in PubMed Google Scholar. Correspondence to Lutz Bornmann.

The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions. Citation concept analysis CCA : a new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by exemplary case studies including classic books by Thomas S.

Kuhn and Karl R. Scientometrics , — Download citation. Received : 28 May Published : 21 December Issue Date : February Anyone you share the following link with will be able to read this content:. Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative. Skip to main content. Search SpringerLink Search. Download PDF. Abstract In recent years, the full text of papers are increasingly available electronically which opens up the possibility of quantitatively investigating citation contexts in more detail.

Introduction Recently, Bu et al. Theoretical reflections on citation concept analysis CCA Since the use of citations in measuring research gained a certain level of attention, researchers have started to develop theories of the citation process. Importance of the books published by Thomas S. Popper used as examples in this study The importance of The structure of scientific revolutions by Kuhn, The logic of scientific discovery — Logik der Forschung: Zur Erkenntnistheorie der modernen Naturwissenschaft in German—, and Conjectures and refutation: the growth of scientific knowledge by Popper cannot be overstated.

Literature review of citation context and content studies Since the citation process is scarcely standardized Cronin , the adoption of a multi-dimensional perspective on citations seems reasonable. Methods Datasets In this study, we used citation context data from MA. Table 1 Concepts and corresponding search terms for Kuhn and Popper Full size table.

Table 2 Analyses of the representativeness of the sample for the population Full size table. Results We undertook a CCA for several landmark publications in philosophy of science: Kuhn and Popper , , Table 3 Citation concept analysis of The structure of scientific revolutions by Kuhn.

Full size table. Table 5 Citation concept analysis of The logic of scientific discovery in German: Logik der Forschung: Zur Erkenntnistheorie der modernen Naturwissenschaft and Conjectures and refutations: the growth of scientific knowledge by Popper. Notes 1. In this study, we considered not only the first editions of the books, but also later editions.

References Abbott, A. Google Scholar Atanassova, I. Article Google Scholar Bornmann, L. Google Scholar Bornmann, L. Article Google Scholar Boyack, K. Article Google Scholar Bu, Y. Article Google Scholar Case, D. Article Google Scholar Chen, C. Article Google Scholar Cronin, B. Article Google Scholar Cumming, G.

Book Google Scholar Gilbert, G. Article Google Scholar Greenberg, S. Article Google Scholar Hammarfelt, B. Article Google Scholar Haunschild, R. Article Google Scholar Hug, S. Article Google Scholar Hyland, K. Article Google Scholar Jann, B. Article Google Scholar Judge, T. Article Google Scholar Kuhn, T.

Google Scholar Lamers, W. Google Scholar National Research Council. Google Scholar Petrovich, E. Article Google Scholar Popper, K. Google Scholar Popper, K. Google Scholar Scheidsteger, T. Article Google Scholar Sinha, A. Article Google Scholar Small, H. Article Google Scholar Solomona, G.

Article Google Scholar Tahamtan, I. Article Google Scholar Teplitskiy, M. Book Google Scholar Wray, K. Google Scholar Wray, K. Article Google Scholar Yates, D. Google Scholar Zuckerman, H. Article Google Scholar Download references.

Acknowledgements Open access funding provided by Max Planck Society. View author publications. About this article. Cite this article Bornmann, L.

Copy to clipboard. It is of great significance in regard to technological innovation and scientific decision-making. Traditional citation analysis methods and tools are overly dependent on citation databases, which have the following drawbacks: All citation acts are treated as equally important.

All kinds of statistical indicators are based on specific instances of citation, which are annotated only by the author. Citation databases can only reveal whether there is a reference shared between different papers but fail to reflect any deeper relationships among semantic citations.

Motivations and behaviors related to citation have been analyzed by researchers from various angles. In , content-based citation analysis method [ 1 ] has also been proposed. In this chapter, we propose a new citation analysis framework based on ontology and linked data; our goal is to enhance the efficacy of citation analysis via semantic web technology.

The World Wide Web Consortium W3C later established a series of technical specifications that promoted the further development of the semantic web; specifications such as RDF, OWL, and SPARQL have allowed the application of the semantic web to many research fields and, further, have laid a foundation for knowledge representation, knowledge organization, and information retrieval on the Internet.

Ontology is one of the backbones of the semantic web and was widely used to specify standard concept vocabulary for exchanging data between systems, offer suggestions of answering queries, publish reusable knowledge bases, and provide services to facilitate operations across heterogeneous systems and databases [ 3 ].

Linked data builds associations between objects through the resource description framework RDF structure, ultimately revealing the relationships and implicitly shared knowledge between heterogeneous sets of data. After more than 10 years of development, linked data has seen numerous breakthroughs in both theoretical and technical aspects.

To date, the linking open data project [ 5 ] has successfully transformed billions of web data points e. In recent years, researchers have begun to introduce semantic web technology to citation analysis in effort to exploit ontology, linked data, and other technologies to improve the description of citation behaviors and motivations.

Citation Typing Ontology CiTO is the ontology SPAR used to describe the relationship between citing papers and cited papers; it provides reference information such as background, method, citation type e. The current version CiTO 2. Ciancarini et al. Iorio et al. By contrast, Recupero et al. Other researchers, for example, Ding, Konidena, Sun, and Chen [ 10 ], have also explored the idea of semantic citation to suggest that individuals can use ontology and linked data to describe bibliographic data and publish it to RDF triples.

Mahmood, Qadir, and Afzal [ 11 ] combined semantic web technology with credible citation analysis to establish a framework that provides openness and reliability validation for all stages of the citation behavior lifecycle.

The framework requires the use of semantic metadata at all stages of academic publishing to annotate the citation behavior and generate machine-readable RDF triples.

This kind of annotation makes author, publisher, database vendor, and citation analysis system work together and build a set of reliable reference information while eliminating any false or misleading citation actions in the literature. More recently, Peroni et al.

The open citation corpus [ 13 ] is created to store citation data from open access databases. Quickly moving into an unfamiliar field for researchers is difficult, due to the mass of scientific articles [ 14 ] that must be reviewed without prior knowledge of their research contents. Such a method is simple, but it often ignores the semantic level of the knowledge resources, causing it to miss a significant number of semantic knowledge resources [ 15 ]. In , Aronson [ 17 ] argued that query refinement based on ontology is more efficient than other methods that were available at the time.

From the perspective of information organization, ontology is a new method of knowledge organization and processing, and it is also the basis of semantic webs. It can systematize and organize a large amount of relevant information. When applying ontology to information retrieval, it is necessary to apply ontological principles to the information resources, so that search reasoning is implemented by the logical rules contained in the ontology itself, and a high quality retrieval result is output.

With respect to the shortcomings of traditional citation information services, the introduction of ontology may help users to improve their searches aimed at multiple citation retrieval.

That is, ontologies deal with the interpretation of words in terms of real-world entities. In recent years, with the advance of ontology, related studies have revealed that ontology-based knowledge services have been developed in different areas, including personalized medicine [ 19 ], e-government [ 20 ], medicine [ 21 , 22 ], smart homes [ 23 ], the digital library [ 24 , 25 ], and so on.

The digital library is an important application area of ontology-based knowledge service research. In , Patkar [ 26 ] indicated that ontology is one of the latest tools for information retrieval from libraries in this digital age. His paper discusses advances in information managing tools and concludes by highlighting the applications of ontology among the different fields. Koutsomitropoulos, Solomo, and Papatheodorou [ 27 ] studied the semantic search service of the DSpace digital repository system.

They argued that Semantic Search v2 introduces a structured query mechanism that makes querying easier and improves the design of the system, performance, and scalability. Queries based on the DSpace ontology were dynamically created, and DSpace was able to obtain structured knowledge from the available metadata.

Empirical and quantitative evaluation has shown that such a system can conduct semantic searches that provide better services for inexperienced users, such as the use of new query dimensions, with clear benefits. In , Iorio and Schaerf [ 28 ] proposed a semantic model defined by the Sapienza Digital Library to describe resource metadata. The semantic model is derived from the metadata object description model a digital library descriptive standard.

A top-level conceptual reference model supports the implementation of semantic web technologies for digital library metadata. Any citation analysis method based on ontology and linked data mainly includes the following three steps: first, building citation ontology according to the bibliographic citation data and full-text citation information; second, using the citation ontology to normalize the reference information and publish the data to linked data according to the RDF model; and, third, in order to extract the required citation information, writing a specific SPARQL search query for a citation analysis dimension and executing the search query.

The search results are then visualized to reach the citation analysis goals. From the perspective of citation analysis, bibliographic citation information and full-text citation information are not only two independent parts but also two important sources of data that are both necessary for citation analysis.

Here, we construct the bibliographic citation ontology BCO and full-text citation ontology FCO based on the bibliographic citation information and the full-text citation information, respectively. This allows us to achieve comprehensive semantic annotation of the citation information at hand. The purpose of this study was to construct a task-based ontology to describe citation information, so we choose the seven-step method developed by the Stanford University.

The seven steps are 1 defining the domain and category of the ontology, 2 examining the possibility of reusing existing ontologies, 3 listing the important terms in the ontology, 4 defining the hierarchical system of classes, 5 defining the properties of the classes, 6 defining the facets of the properties, and 7 creating the instance.

The construction of BCO is based on references. From the list of references, information such as the author, periodical, document type, year, volume period, and page number are extracted as the classes of BCO.

In order to extend the dimensions of citation analysis, we extend the subclass from the perspective of journal and author. For property definitions, we reused the already-existing ontology properties e.

Example classes and properties of bibliographic citation ontology. The construction of FCO begins with three aspects: citation function, citation sentiment, and citation position. The citation function represents the role of cited work to citing work, such as background development, data support, methodology support, extension, or refutation. Citation sentiment expresses the emotion attitude from citing work to cited work, such as positive, neutral, and negative.

Scopus and Google Books citations complement each other and have little overlap. Departmental level estimates of citation impact agree reasonably well with panel committee peer review ratings of early career researcher support. In this article we present a study on the feasibility of Ph.

The context is the German national research system with its characteristics of a very high ratio of graduating Ph. The first nationwide census in Germany reported , registered active doctoral students in Germany Vollmar In the same year, 28, doctoral students passed their exams in Germany Statitisches Bundesamt Both universities and science and higher education policy attach high value to doctoral training and consider it a core task of the university system.

For this reason, doctoral student performance also plays an important role in institutional assessment systems. While there is currently no national scale research assessment implemented in Germany, all German federal states have introduced formula-based partial funding allocation systems for universities. In most of these, the number of Ph. Most universities also partially distribute funds internally by similar systems. Such implementations can be seen as incomplete as they do not take into account the actual research output of Ph.

In this contribution we investigate if citation analysis of doctoral theses is feasible on a large scale and can conceptually and practically serve as a complement to current operationalizations of ECR performance.

For this purpose we study the utility of two citation data sources, Scopus and Google Books. We analyze the obtained citation data at the level of university departments within disciplines.

The doctoral studies phase can theoretically be conceived as a status transition period. Footnote 1 The published doctoral thesis and its public defense are manifest proof of the fulfilment of the degree criterion of independent scientific contribution, marking said transition.

The scientific community, rather than the specific organization, collectively sets the goals and standards of work in the profession, and experienced members of a community of peers judge and grade the doctoral work upon completion. Footnote 2 Yet the specific organization also plays a very important role.

The Ph. As a rule, it is a formal requirement of doctoral studies that the Ph. The extent to which other researchers make use of these results is reflected in citations to the work and is in principle amenable to bibliometric citation analysis Kousha and Thelwall Citation impact of theses can be seen as a proxy of the recognition of the utility and relevance of the doctoral research results by other researchers. Theses are often not published in an established venue and are hence absent from the usual channels of communication of the research front, more so in journal-oriented fields, whereas in book-oriented fields, publication of theses through scholarly publishers is common.

We address this challenge by investigating the presence of dissertation citations in data sources hitherto not sufficiently considered for this purpose in what follows. Almost all universities in Germany are predominantly tax-funded and the consumption of these public resources necessitates a certain degree of transparency to establish and maintain the perceived legitimacy of the higher education and research system. Consequently, universities and their subdivisions are increasingly subjected to evaluations.

The pressure to participate in evaluation exercises, or in some cases the bureaucratic directive to do so by the responsible political actors, in turn, derives from demands of the public, which holds political actors accountable for the responsible spending of resources appropriated from net tax payers.

Because the training of Ph. While there is no official established national-scale research evaluation exercise in Germany Hinze et al. In the following paragraphs we will shows this with several examples while critically discussing some inadequacies of the extant operationalizations of the ECR performance dimensions, thereby substantiating the case for more research into the affordance of Ph. The Council of Science and Humanities Wissenschaftsrat has conducted four pilot studies for national-scale evaluations of disciplines in universities and research institutes Forschungsrating.

While the exercises were utilized to test different modalities Footnote 3 , they all followed a basic template of informed peer review by appointed expert committees along a number of prescribed performance dimensions. The evaluation results did not have any serious funding allocation or restructuring consequences for the units.

Footnote 4 In all four exercises, the dimension was operationalized with a combination of quantitative and qualitative criteria. Yet, some of the applied indicators are more in line with a construct such as the performance, or success, of the ECRs themselves, namely, first appointments of graduates to professorships, scholarships or fellowship of ECRs if granted externally of the assessed unit , and awards.

Footnote 5 As for the difference between the concept of the efforts expended for ECRs and the concept of the performance of ECRs, it appears to be implied that the efforts cause the performance, but this is far from self-evident. There may well be extensive support programs without realized benefits or ECRs achieving great success despite a lack of support structures. For this implied causal connection to be accepted, its mechanism should first be worked out and articulated and then be empirically validated, which was not the case in the Forschungsrating evaluation exercises.

Footnote 6. No bibliometric data on Ph. However, it stands to reason that citation analysis of theses might provide a valuable complementary tool if a more sound operationalization of the dimension of the performance of ECRs is to be established in similar future assessments. As for the publications of ECRs besides doctoral theses, these have been included in the other dimensions in which publications were used as criteria without special consideration.

Footnote 7. There is a further area of university evaluation in which a performance indicator of ECRs, namely the absolute number of Ph. In these systems, universities within a state compete with one another for a modest part of the total budget based on fixed formulas relating performance to money. In direct consequence, similar systems have also found widespread application to distribute funds across departments within universities Jaeger ; Niggemann These systems differ across universities.

If only the number of completed Ph. It is conceivable that graduating as many Ph. A working group tasked by the Federal Ministry of Education and Research to work out an indicator model for monitoring the situation of early career researchers in Germany proposed to consider the citation impact of publications as an indicator of outcomes Projektgruppe Indikatorenmodell Moreover, it is stated that the literature of the social sciences and humanities are not covered well in citation indexes and theses are generally not indexed as primary documents p.

Nevertheless, this approach is not to be rejected out of hand. Rather, it is recommended that the prospects of thesis citation analysis be empirically studied to judge its suitability p. To sum up, the foregoing discussion establishes 1 that there is a theoretically underdeveloped evaluation practice in the area of ECR support and performance, and 2 that a need for better early career researcher performance indicators on the institutional level has been suggested to science policy actors.

This gives occasion to explore which, if any, contribution bibliometrics can make to a valid and practically useful assessment. There are few publications on citation analysis of Ph. Yoels studied citations to dissertations in American journals in optics, political science one journal each , and sociology two journals from the to volumes.

In each case, several hundred citations in total to all Ph. Non-US dissertations were cited only in optics. Author self-citations were very common, especially in optics and political science. While citations peaked in the periods of 1—2 or 3—5 years after the Ph. The impact of individual theses was not investigated. This study used a search approach in the cited references, based on keywords for theses and filtering, which may not be able to discover all dissertation citations.

Kousha and Thelwall investigated Google Scholar citation counts and Mendeley reader counts for a large set of American dissertations from to sourced from ProQuest.

This study did not take into account Google Books. Average citation counts were comparatively high in the arts, social sciences, and humanities, and low in science, technology, and biomedical subjects.

This suggests that Google Books might be a relevant citation data source instead of, or in addition to, Google Scholar. More research has been conducted into the citation impact of thesis-related journal publications. Hay found that for the special case of a small sample from UK human geography research, papers based on Ph.

In a recent study of refereed journal publications based on US psychology Ph. The citation impact of journal articles to which Ph. The impact of journal papers with Ph. The area with a notable difference between groups was arts and humanities, in which the coverage of publication output in the database was less comprehensive because a lot of research is published in monographs, and in which presumably many papers were written in French, another reason for lower coverage.

While these papers are not concerned with citations to dissertations, they do suggest that the research of Ph. To the best of our knowledge, no large scale study has been conducted on the citation impact of German theses on the level of individual works or on the level of university departments. We so far have scant information on the citation impact of dissertation theses, therefore the current study aims to fill this gap by a large scale investigation of citations received by German Ph.

As we wish to investigate performance differences between departments of universities by discipline as reflected by thesis citations, we next consider the literature on plausible reasons for such performance differences which can result in differences in thesis citation impact.

We do not consider individual level reasons for performance differences such as ability, intrinsic motivation, perseverance, and commitment. One possible reason for cross-department performance differences is mutual selectivity of Ph. In a situation in which there is some choice between the departments at which prospective Ph. That is, applicants will opt for the most promising department for their future career while supervisors or selection committees, and thus departments, will attempt to select the most promising candidates, perhaps those who they judge most likely to contribute positively to their research agenda.

This is part of the normal, constant social process of mutual evaluation in science. However, in this case, the mutual evaluation does not take place between peers, that is, individuals of equal scientific social status. Rather, the situation is characterized by status inequality superior-inferior, i. Consequently, an applicant may well apply to her or his preferred department and supervisor, but the supervisor or the selection committee makes the acceptance decision.

In practice however, there are many constraints on such situations. For example, as described above, the current evaluation regime rewards the sheer quantity of completed Ph.

Once the choices are made, Ph. For instance, some departments might have access to important equipment and resources, others not. There may prevail different local practices in time available for the Ph.

Experienced and engaged supervisors teach explicit and tacit knowledge and can serve as role models. Long and McGinnis found that the performance of mentors was associated with Ph. However, there are multiple professors or other supervisors at any department, which causes variation within departments if the department and not the supervisor is used as a predictive variable. Between departments it is then the concentration of highly accomplished supervisors that may cause differences.

Beyond immediate supervisors, a more or less supportive research environment can offer opportunities for learning, cooperation or access to personal networks. For example, Kim and Karau found that support from faculty, through the development of research skills, lead to higher publication productivity of management Ph.

Local work culture and local expectations of performance may elicit behavioral adjustment Allison and Long In summary, prior research shows that there are several reasons to expect department-level differences of Ph.

But it needs to be noted that the present study cannot serve towards shedding light on which particular factors are associated with Ph. It is limited to testing if there are any department-level differences on this measure. We have argued above that citation analysis of theses could be a complementary tool for quantitative assessment of university departments in terms of the research performance of early career researchers.

Hence it needs to be established that citation counts of dissertations are in fact associated with a conception of the impact of research. This attention is interpreted as an indicator of the importance, the visibility, or the impact of the researcher or the paper in the scientific community.

Whether citation measures also express research quality is a highly debated issue. Nevertheless, work that is cited usually has some importance for the citing work. On the contrary, for a citation to persuade anyone, the content of the cited work needs to be convincing rather than ephemeral, irrelevant, or immaterial.

Citation counts are thus a direct measure of the utility, influence, and importance of publications for further research Martin and Irvine , sec. Therefore, as a measure of scientific impact, citation counts have face validity.

They are a measure of the concept itself, though a noisy one. Not so for research quality. Highly relevant for the topic of the present study are the early citation impact validation studies by Nederhof and van Raan , Nederhof and van Raan These studied the differences in citation impact of publications produced during doctoral studies of physics and chemistry Ph. In fact, differences in citation impact of papers between the groups are already apparent before graduation, that is, before the conferral of the cum laude distinction on the basis of the dissertation.

A possible scenario would be that some PhD graduates are choosen carefully by their mentors to do research in one of the usually rare very promising, interesting and hot research topics currently available. The present data do not offer any support for this stance. In Germany, a system of four passing marks and one failing mark is commonly used. The better the referees judge the thesis, the higher the mark. Studies investigating the association of level of mark and citation impact of theses or thesis-associated publications are as of yet lacking.

Oestmann et al. Their data for — shows a longitudinal decrease of the incidence of the third best mark and an increase of the second best mark. For samples from 3 years , , for which publication data were collected, an association between the level of the mark and the publication productivity was detected. Both the chance to publish any peer-reviewed articles and the number of articles increase with the level of the mark.

The study was extended in Chuadja with publication data for graduates. It was found that the time to graduation covaries with the level of the mark. For graduates, the average 5 year Journal Impact Factors for thesis-associated publication increase with the level of the graduation mark in the sense that theses awarded better marks produced publications in journals with higher Impact Factors. As little as these findings say about the real association of thesis research quality and citation impact, they suggest enough to motivate more research into this relationship.

Does Google Books contain sufficient additional citation data to warrant its inclusion as an additional data source alongside established data sources?

Can differences between universities within a discipline explain some of the variability in citation counts? Are there noteworthy differences in Ph. To test whether or not dissertation citation impact is a suitable indicator of departmental Ph. As a first step towards a better understanding of Ph. The present study is restricted to monograph form dissertations. These also include monographs that are based on material published as articles.

However, to be able to assess the complete scientific impact of a Ph. Because of this, the later results should be interpreted with due caution as we do not claim completeness of data. There is presently no central integrated source for data on dissertations from Germany. This source of dissertation data has been found useful for science studies research previously Heinisch and Buenstorf ; Heinisch et al.

We downloaded records for all Ph. Records were downloaded by subject fields in the CSV format option. Footnote 8 In this first step , records were obtained. In a second step, the author name and work title field were cleaned and the university information extracted and normalized and non-German university records excluded. We also excluded records assigned to medicine as a first subject class which were downloaded because they were assigned to other classes as well.

As the dataset often contained more than one version of a particular thesis because different formats and editions were cataloged, these were carefully de-duplicated. In this process, as far as possible the records containing the most complete data and describing the temporally earliest version were retained as the primary records.

Variant records were also kept in order to later be able to collect citations to all variants. This reduced the dataset to a size of , records. Footnote 9 If more than one subject class was assigned, only the first was retained. Citation data from periodicals was obtained from a snapshot of Scopus data from May Scopus was chosen over Web of Science as a source of citation data because full cited titles for items not contained as primary documents in Web of Science have only recently been indexed.

Before this change, titles were abbreviated so inconsistently and to such short strings as to be unusable, while this data is always available in unchanged form in Scopus if it is available in the original reference.

Cited references data was restricted to non-source citations, that is, references not matched with Scopus-indexed records. Dissertation bibliographical information author name, publication year and title for primary and secondary records was compared to reference data.

Before comparison, both titles were truncated to the character length of the shorter title. If the edit distance similarity between titles was greater than 75 out of , the citation was accepted as valid and stored.



0コメント

  • 1000 / 1000