Common Fund Data Ecosystem Centers
The NIH Common Fund (CF) programs have produced transformative datasets, databases, methods, bioinformatics tools and workflows that are significantly advancing biomedical research in the United States and worldwide. Currently, CF programs are mostly isolated. However, integrating data from across CF programs has the potential for synergistic discoveries. In addition, since CF programs have a time limit of 10 years, sustainability of the widely used CF digital resources after the programs expire is critical. To address these challenges, the NIH established the Common Fund Data Ecosystem (CFDE) program which has been recently approved to continue to its second new phase. For the second phase of the CFDE five centers were established.
The CFDE Cloud Workspace is designed to provide researchers with an accessible and collaborative environment for data analysis. It allows users to import and integrate data with Common Fund datasets while utilizing a wide range of analysis tools, workflows, and pipelines. The Cloud Workspace Implementation Center (CWIC) streamlines deployment by leveraging key partnerships: TACC’s high-performance computing resources, Galaxy’s open-source interface for data analysis, and CloudBank’s tools for simplified cloud access and billing. This workspace supports both novice and expert users by offering training, outreach, and cost-management tools to optimize resource usage. Users will have access to a variety of tools developed by CFDE, Galaxy, and other partners, with the flexibility to incorporate custom or third-party tools. By providing free storage and compute resources, the Cloud Workspace lowers barriers to entry, enabling researchers to work with large datasets and complex analyses. By fostering data sharing, collaboration, and ease of access, CWIC accelerates biomedical research and supports a broader scientific community in tackling high-priority challenges. This initiative represents a significant step toward expanding access to advanced computational resources and empowering researchers to drive innovation in the field.

The CONNECT Integration and Coordination Center (ICC) is dedicated to advancing biomedical research within the Common Fund Data Ecosystem (CFDE) by enhancing efficiency, transparency, and innovation. Led by Prof. Jake Chen at UAB, alongside Profs. Casey Greene, Sean Davis, Peipei Ping, and Wei Wang, the ICC integrates three key cores—Administrative, Evaluation, and Sustainability—to drive its mission forward. The Administrative Core, led by Prof. Chen, ensures seamless coordination and project management across CFDE entities. Utilizing Agile methodologies and collaboration tools like U-BRITE, it optimizes communication and accelerates scientific progress. The Evaluation Core, led by Profs. Greene and Davis, focuses on continuous quality improvement, developing evaluation metrics and feedback mechanisms to assess and enhance CFDE initiatives. Meanwhile, the Sustainability Core, led by Prof. Ping with support from Prof. Wang, ensures the long-term accessibility and reusability of CF program data through strategic data management practices and repository planning. By integrating these efforts, the CONNECT ICC fosters collaboration, accelerates biomedical discoveries, and strengthens CFDE’s long-term impact. Through innovative methodologies and expert leadership, it is positioned to drive transformative advancements in biomedical research.
The CFDE Training Center (TC) will ensure that Common Fund Data Ecosystem (CFDE) data, tools, and resources are findable, accessible, interoperable, and reusable (FAIR). Serving as the central hub for training development, coordination, and evaluation, the TC will expand the CF data user base, enhance dataset usage, and increase awareness of CFDE resources. Team ORAU will achieve this through targeted training programs, outreach initiatives, and a collaborative learning community. The TC will design customized training materials for a diverse audience, with a focus on postgraduate students and early-career investigators. A CFDE Mentoring Program will pair learners with experienced bioinformaticians to provide technical and professional guidance, fostering careers in bioinformatics and data science. Additionally, the TC will ensure diversity, equity, inclusion, and accessibility (DEIA) through targeted recruitment, a Diversity Committee, and an inclusive learning environment. To support collaboration, the TC will establish a CFDE Trainers Working Group to share best practices, a CFDE Landing Page for centralized training resources, and a Virtual Community of Learners for engagement and ongoing support. Development will be informed by a comprehensive landscape analysis, including gap assessments, literature reviews, and stakeholder input. Ongoing evaluation will guide program improvements and ensure alignment with CFDE and NIH goals. Led by a skilled team from Oak Ridge Associated Universities (ORAU) and BioData Sage LLC, the TC will be managed through structured project controls, ensuring effective, flexible coordination with NIH and CFDE stakeholders. Through these efforts, the TC will build a strong, supportive ecosystem for training biomedical researchers in data-driven science.

The CFDE Data Resource Center (DRC) was tasked with developing two web-based portals: an **Information Portal** to serve information about the CFDE and a **Data Portal** to host harmonized metadata and processed data contributed by participating CF Data Coordination Centers (DCCs) and other sources. By combining the data and information portals, the **CFDE Workbench** is a comprehensive resource where users can collect both information and data from CFDE and CF resources, as well as query disease, gene, drug, and other biological entities across standardized data formats from each CF DCC. The CFDE Workbench consolidates efforts toward making CF programs funded resources harmonized, FAIR, and AI-ready. To achieve these goals, the DRC team works collaboratively with the other CFDE newly established centers, the participating CFDE DCCs, the CFDE NIH team, and relevant external entities and potential consumers of these three software products. These interactions will be achieved via face-to-face meetings, virtual working groups meeting, one-on-one meetings, Slack, GitHub, project management software, and e-mail exchange. Via these interactions, we will establish standards, workstreams, feedback and mini projects towards accomplishing the goal of developing a lively and productive Common Fund Data Ecosystem. The **Data Portal** of the CFDE Workbench catalogs several types of uniformly processed data and metadata filesand other digital objects from each participating DCC. The **Information Portal** provides relevant information about each DCC and on overarching consortium activities that include training and outreach events, brief descriptions of CFDE partnership projects, and detailed community-established protocols.
Making NIH Common Fund (CF) datasets FAIR is just the first step in unlocking their potential in the era of big data. Scientific progress depends on accessible knowledge, yet non-computational researchers often struggle with interpreting knowledge graphs (KGs) due to their logic-based reasoning, which can overlook scientific context and uncertainty, leading to invalid inferences. Our CFDE Knowledge Center (KC) will focus on presenting scientifically valid knowledge from CF projects in a KG format aligned with CFDE and external curation efforts. To ensure accuracy, we will emphasize careful knowledge extraction—ensuring each KG edge is based on primary experimental findings or expert analysis—and thoughtful knowledge presentation, using tailored visualizations instead of general graph traversal. Leveraging our experience from four large-scale NIH-funded projects, we will develop a user-friendly portal that enhances data accessibility and scientific validity, empowering a diverse range of researchers to engage with CF-generated knowledge.