My name is Andriy. I'm currently building a company called Nomic with the goal of making AI systems more accessible and explainable to all humans.

Before Nomic I was a machine learning Ph.D. student at NYU Courant focused on interpretability of deep nets. Teaching computers do things that only humans should be able to do has dominated my mind-space since about 2017.

The set of topics I am easily distracted by if brought up in conversation include: self-supervised representation learning, pre-transformers NLP, clinical informatics, long-winded discussions about data cleaning, building infrastructure for software that involves ML models and why you should care about helping fight for democracy in Ukraine.

Projects I am Proud of

Nomic Embed: Training a Reproducible Long Context Text Embedder
Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar
technical report
[code] [report]
MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning.
Andriy Mulyar, Ozlem Uzuner, Bridget McInnes
Journal of the American Medical Informatics Association
[code] [paper]
Clinical Concept Linking with Contextualized Neural Representations.
Elliot Schumacher, Andriy Mulyar, Mark Dredze
ACL 2020
Phenotyping of clinical notes with improved document classification models using contextualized neural language models.
Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, Mark Dredze
NeurIPS 2019 ML4H Workshop Extended Abstract
[code] [paper] [blog]


Things I've been up to.
April 2023 Raised 17M in venture funding to continue building Nomic.
April 2022 Left my Ph.D. to work to co-found Nomic - a company dedicated to building explainable and accessible AI systems.
September 2021 Started a Ph.D. in Machine Learning at NYU Courant.
September 2020 Worked as a machine learning engineer at Rad AI digging through millions of radiology reports and training LLMs on them.
June 2020 Software Engineering (ML/NLP) intern at Costar Group.
February 2020 CRA published a write-up about me.
December 2019 Attended NeurIPS 2019 and presented a poster at ML4Health!
November 2019 Presented posters at the N2C2 2019 workshop (AMIA) and VCU Undergraduate Research Symposium.
Summer 2019 Internship at Johns Hopkins CLSP supervised by Mark Dredze.
Spring 2019 Led a team of undergraduate students to improve software experience of medaCy. Supervised by Bridget McInnes.
Fall 2018 Co-developed a software framework, medaCy , for building and sharing statistical information extraction models. Supervised by Bridget McInnes.
Summer 2018 VCU DURI Fellowship supervised by Bartosz Krawczyk to study theoretical properties of decision trees.

Inactive Projects

Things i've done.
Multitasking Transformers
Training Transformers to perform multiple tasks with the same set of representations.
January 2020
Multi-label Document Classification with BERT
Language model powered long document classification architectures. (NeurIPS ML4Health 2019).
September 2019
Clinical Semantic Similarity
Training language models (BERT) towards associating semantic equivalence in clinical notes.
July 2019
medaCy: Medical Text Mining and NLP Framework
medaCy is a highly predictive text processing and NLP research framework built over spaCy that leverages cutting-edge tools for mining medical text.
August 2018
Clinical Concept Normalization and Extraction
Applying neural ranking to map unstructured text in clinical notes and electronic health records to structured medical ontologies. Work accepted at ACL 2020.
June 2019
Automatic Graph Conjecturing
A service auto-conjecturing over graphs to empirically discover novel relations between graph theoretic properties and invariants. A project under Dr. Craig Larson.
May 2019
Decision Trees: Exploiting Local Data Properties and Nested Ensembles
Trees are excellent learners: simplistic, interpretable and versatile. This project explores their interaction with local data characteristics to improve predictive performance and interpretability.
March 2018
Gateway Math
A software for mathematics educators to generate dynamic worksheets.
January 2017
Reproducible Machine Learning
Effective methods to maintain replicable and reproducible research environments in computational science domains.
November 2018