Dienst van SURF
© 2025 SURF
Objective:Acknowledging study limitations in a scientific publication is a crucial element in scientific transparency and progress. However, limitation reporting is often inadequate. Natural language processing (NLP) methods could support automated reporting checks, improving research transparency. In this study, our objective was to develop a dataset and NLP methods to detect and categorize self-acknowledged limitations (e.g., sample size, blinding) reported in randomized controlled trial (RCT) publications.Methods:We created a data model of limitation types in RCT studies and annotated a corpus of 200 full-text RCT publications using this data model. We fine-tuned BERT-based sentence classification models to recognize the limitation sentences and their types. To address the small size of the annotated corpus, we experimented with data augmentation approaches, including Easy Data Augmentation (EDA) and Prompt-Based Data Augmentation (PromDA). We applied the best-performing model to a set of about 12K RCT publications to characterize self-acknowledged limitations at larger scale.Results:Our data model consists of 15 categories and 24 sub-categories (e.g., Population and its sub-category DiagnosticCriteria). We annotated 1090 instances of limitation types in 952 sentences (4.8 limitation sentences and 5.5 limitation types per article). A fine-tuned PubMedBERT model for limitation sentence classification improved upon our earlier model by about 1.5 absolute percentage points in F1 score (0.821 vs. 0.8) with statistical significance (). Our best-performing limitation type classification model, PubMedBERT fine-tuning with PromDA (Output View), achieved an F1 score of 0.7, improving upon the vanilla PubMedBERT model by 2.7 percentage points, with statistical significance ().Conclusion:The model could support automated screening tools which can be used by journals to draw the authors’ attention to reporting issues. Automatic extraction of limitations from RCT publications could benefit peer review and evidence synthesis, and support advanced methods to search and aggregate the evidence from the clinical trial literature.
MULTIFILE
Objective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM).Results: Annotators had good agreement in labeling limitation sentences (Krippendorff's α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]).Conclusions: The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.
Background: In their research reports, scientists are expected to discuss limitations that their studies have. Previous research showed that often, such discussion is absent. Also, many journals emphasize the importance of avoiding overstatement of claims. We wanted to see to what extent editorial handling and peer review affects self-acknowledgment of limitations and hedging of claims.Methods: Using software that automatically detects limitation-acknowledging sentences and calculates the level of hedging in sentences, we compared the submitted manuscripts and their ultimate publications of all randomized trials published in 2015 in 27 BioMed Central (BMC) journals and BMJ Open. We used mixed linear and logistic regression models, accounting for clustering of manuscript-publication pairs within journals, to quantify before-after changes in the mean numbers of limitation-acknowledging sentences, in the probability that a manuscript with zero self-acknowledged limitations ended up as a publication with at least one and in hedging scores.Results: Four hundred forty-six manuscript-publication pairs were analyzed. The median number of manuscripts per journal was 10.5 (interquartile range 6-18). The average number of distinct limitation sentences increased by 1.39 (95% CI 1.09-1.76), from 2.48 in manuscripts to 3.87 in publications. Two hundred two manuscripts (45.3%) did not mention any limitations. Sixty-three (31%, 95% CI 25-38) of these mentioned at least one after peer review. Changes in mean hedging scores were negligible.Conclusions: Our findings support the idea that editorial handling and peer review lead to more self-acknowledgment of study limitations, but not to changes in linguistic nuance.