Dienst van SURF
© 2025 SURF
To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis.
Learning mathematical thinking and reasoning is a main goal in mathematical education. Instructional tasks have an important role in fostering this learning. We introduce a learning sequence to approach the topic of integrals in secondary education to support students mathematical reasoning while participating in collaborative dialogue about the integral-as-accumulation-function. This is based on the notion of accumulation in general and the notion of accumulative distance function in particular. Through a case-study methodology we investigate how this approach elicits 11th grade students’ mathematical thinking and reasoning. The results show that the integral-as-accumulation-function has potential, since the notions of accumulation and accumulative function can provide a strong intuition for mathematical reasoning and engage students in mathematical dialogue. Implications of these results for task design and further research are discussed.
BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of parallel computing in bioinformatics by making its use and extension more simple through more and better application of high-level languages commonly used in bioinformatics, such as Python.RESULTS: The novel application pyPaSWAS presents the parallel SW sequence alignment code fully packed in Python. It is a generic SW implementation running on several hardware platforms with multi-core systems and/or GPUs that provides accurate sequence alignments that also can be inspected for alignment details. Additionally, pyPaSWAS support the affine gap penalty. Python libraries are used for automated system configuration, I/O and logging. This way, the Python environment will stimulate further extension and use of pyPaSWAS.CONCLUSIONS: pyPaSWAS presents an easy Python-based environment for accurate and retrievable parallel SW sequence alignments on GPUs and multi-core systems. The strategy of integrating Python with high-performance parallel compute languages to create a developer- and user-friendly environment should be considered for other computationally intensive bioinformatics algorithms.
Jaarlijks worden in Nederland ongeveer 600.000 mensen ziek door het eten van besmet voedsel. De voedselverwerkende industrie heeft sterke behoefte aan meer grip op het bewaken van de hygiëne in de fabrieken om te voorkomen dat besmette producten in de winkels komen. In het afgeronde RAAK-mkb project “Precision Food Safety” is onderzocht wat de meerwaarde is van de toepassing van Whole Genome Sequencing (WGS) bij het achterhalen van de transmissieroutes van de pathogene bacterie Listeria monocytogenes bij voedselverwerkende bedrijven. Er is een biobank opgebouwd met bijna 600 L. monocytogenes stammen afkomstig van de fabrieksomgeving en producten van vis-, vlees- en groente-verwerkende bedrijven. Deze stammen zijn gesequenced met behulp van Nanopore sequencing. Vervolgens is de verwantschap tussen de stammen bepaald met een in het project ontwikkelde bioinformatica pijplijn. Het project bleek zeer succesvol. In “Advanced Precision in Food Safety ” wordt het onderzoek naar voedselveiligheid verbreed, door L. monocytogenes al aan het begin van de voedselverwerkingsketen (in grondstoffen en ingrediënten) te monitoren. Verder zal de WGS-methodiek worden toegepast op Salmonella enterica en zal de huidige bioinformatica pijplijn worden aangepast om transmissieroutes van dit andere belangrijke voedselpathogeen te achterhalen. Ter verdieping zal het ziekteverwekkende karakter van L. monocytogenes stammen worden bepaald op basis van het serotype en de aanwezigheid van ~60 beschreven virulentiegenen. Daarbij worden gegevens uit verschillende databases, met sequence data van zowel humane als niet humane stammen, met elkaar vergeleken. Zowel in het laboratorium als in de fabrieksomgeving zal het effect van verschillende schoonmaakmiddelen en schoonmaaktechnieken worden onderzocht op het elimineren van L. monocytogenes van oppervlaktes. Tevens wordt onderzocht of shotgun metagenomics analyse kan worden ingezet om voedsel snel en breed op voedselpathogenen te monitoren. Een prototype van een webapplicatie, waarmee bedrijven verkregen resultaten kunnen inzien en aanvullen zal verder worden ontwikkeld en door voedselverwerkende bedrijven worden getest en geïmplementeerd.
Huntington’s disease (HD) and various spinocerebellar ataxias (SCA) are autosomal dominantly inherited neurodegenerative disorders caused by a CAG repeat expansion in the disease-related gene1. The impact of HD and SCA on families and individuals is enormous and far reaching, as patients typically display first symptoms during midlife. HD is characterized by unwanted choreatic movements, behavioral and psychiatric disturbances and dementia. SCAs are mainly characterized by ataxia but also other symptoms including cognitive deficits, similarly affecting quality of life and leading to disability. These problems worsen as the disease progresses and affected individuals are no longer able to work, drive, or care for themselves. It places an enormous burden on their family and caregivers, and patients will require intensive nursing home care when disease progresses, and lifespan is reduced. Although the clinical and pathological phenotypes are distinct for each CAG repeat expansion disorder, it is thought that similar molecular mechanisms underlie the effect of expanded CAG repeats in different genes. The predicted Age of Onset (AO) for both HD, SCA1 and SCA3 (and 5 other CAG-repeat diseases) is based on the polyQ expansion, but the CAG/polyQ determines the AO only for 50% (see figure below). A large variety on AO is observed, especially for the most common range between 40 and 50 repeats11,12. Large differences in onset, especially in the range 40-50 CAGs not only imply that current individual predictions for AO are imprecise (affecting important life decisions that patients need to make and also hampering assessment of potential onset-delaying intervention) but also do offer optimism that (patient-related) factors exist that can delay the onset of disease.To address both items, we need to generate a better model, based on patient-derived cells that generates parameters that not only mirror the CAG-repeat length dependency of these diseases, but that also better predicts inter-patient variations in disease susceptibility and effectiveness of interventions. Hereto, we will use a staggered project design as explained in 5.1, in which we first will determine which cellular and molecular determinants (referred to as landscapes) in isogenic iPSC models are associated with increased CAG repeat lengths using deep-learning algorithms (DLA) (WP1). Hereto, we will use a well characterized control cell line in which we modify the CAG repeat length in the endogenous ataxin-1, Ataxin-3 and Huntingtin gene from wildtype Q repeats to intermediate to adult onset and juvenile polyQ repeats. We will next expand the model with cells from the 3 (SCA1, SCA3, and HD) existing and new cohorts of early-onset, adult-onset and late-onset/intermediate repeat patients for which, besides accurate AO information, also clinical parameters (MRI scans, liquor markers etc) will be (made) available. This will be used for validation and to fine-tune the molecular landscapes (again using DLA) towards the best prediction of individual patient related clinical markers and AO (WP3). The same models and (most relevant) landscapes will also be used for evaluations of novel mutant protein lowering strategies as will emerge from WP4.This overall development process of landscape prediction is an iterative process that involves (a) data processing (WP5) (b) unsupervised data exploration and dimensionality reduction to find patterns in data and create “labels” for similarity and (c) development of data supervised Deep Learning (DL) models for landscape prediction based on the labels from previous step. Each iteration starts with data that is generated and deployed according to FAIR principles, and the developed deep learning system will be instrumental to connect these WPs. Insights in algorithm sensitivity from the predictive models will form the basis for discussion with field experts on the distinction and phenotypic consequences. While full development of accurate diagnostics might go beyond the timespan of the 5 year project, ideally our final landscapes can be used for new genetic counselling: when somebody is positive for the gene, can we use his/her cells, feed it into the generated cell-based model and better predict the AO and severity? While this will answer questions from clinicians and patient communities, it will also generate new ones, which is why we will study the ethical implications of such improved diagnostics in advance (WP6).
CRISPR/Cas genome engineering unleashed a scientific revolution, but entails socio-ethical dilemmas as genetic changes might affect evolution and objections exist against genetically modified organisms. CRISPR-mediated epigenetic editing offers an alternative to reprogram gene functioning long-term, without changing the genetic sequence. Although preclinical studies indicate effective gene expression modulation, long-term effects are unpredictable. This limited understanding of epigenetics and transcription dynamics hampers straightforward applications and prevents full exploitation of epigenetic editing in biotechnological and health/medical applications.Epi-Guide-Edit will analyse existing and newly-generated screening data to predict long-term responsiveness to epigenetic editing (cancer cells, plant protoplasts). Robust rules to achieve long-term epigenetic reprogramming will be distilled based on i) responsiveness to various epigenetic effector domains targeting selected genes, ii) (epi)genetic/chromatin composition before/after editing, and iii) transcription dynamics. Sustained reprogramming will be examined in complex systems (2/3D fibroblast/immune/cancer co-cultures; tomato plants), providing insights for improving tumor/immune responses, skin care or crop breeding. The iterative optimisations of Epi-Guide-Edit rules to non-genetically reprogram eventually any gene of interest will enable exploitation of gene regulation in diverse biological models addressing major societal challenges.The optimally balanced consortium of (applied) universities, ethical and industrial experts facilitates timely socioeconomic impact. Specifically, the developed knowledge/tools will be shared with a wide-spectrum of students/teachers ensuring training of next-generation professionals. Epi-Guide-Edit will thus result in widely applicable effective epigenetic editing tools, whilst training next-generation scientists, and guiding public acceptance.