Clojure Where it Counts: Tidying Data Science Workflows
Data cleaning and organization is a tedious and time consuming part of any data science workflow. To address these pain points at the Parker Institute for Cancer Immunotherapy, we turned to Clojure to handle our complex and semantically rich datasets, such as molecular and clinical data from patients undergoing cancer therapy with experimental treatments. Our solution is centered around Datomic, and we have built a number of tools to integrate this database in a data science environment. These tools include a configurable data-driven ETL pipeline (that does not require writing any code or knowledge of Datomic internals) and libraries for writing Datalog queries and using the results directly within R. Rather than re-implementing analysis code in Clojure, we bring Clojure’s unique strengths to an existing data science environment
Pier Federico Gherardini
Parker Institute for Cancer Immunotherapy
@pfgherardini
Pier Federico Gherardini is Director of Informatics at The Parker Institute for Cancer Immunotherapy. His work focuses on the development of computational methods for the visualization and analysis of high-dimensional data, with a particular focus on single-cell analysis.
Dr. Gherardini has a strong passion for the development and application of new technologies and for thinking about the challenges such technologies bring to data analysis and interpretation. At PICI, his work focuses on the implementation of advanced assay technologies for the analysis of clinical samples, from evaluating new approaches, to developing computational methods to visualize and analyze the resulting data. Dr. Gherardini is also interested in tools that support human-driven analysis of complex datasets, including interactive visualization methods, and information systems that can support a rapid iteration cycle of analysis and interpretation.
Prior to joining PICI, Dr. Gherardini was a postdoctoral fellow in the lab of Garry P. Nolan at Stanford, where he performed both computational and wet lab work. While at Stanford he developed a technology for highly multiplexed simultaneous measurement of proteins and RNAs in single cells by mass cytometry, as well as a method for building an interactive map of the immune system that incorporates prior knowledge about major cell types.
Dr. Gherardini holds master’s and doctoral degrees from the University of Rome Tor Vergata in Italy and a certificate from the Ignite program in innovation and entrepreneurship from the Stanford University Graduate School of Business.
Ben Kamphaus
Ben Kamphaus is a software developer and data scientist at Cognitect. He has a PhD in Geography from SUNY Buffalo where he got his initial exposure to interdisciplinary challenges in data infrastructure, including the use of ontologies, RDF, and other approaches to finding and expressing common data representations. These days he relies heavily on Clojure and Datomic to stay sane when moving data between systems and use contexts. In his free time he hikes, runs, climbs, writes science fiction, and produces electronic music as PatternShift.