Andriy Mulyar | Academic Projects and Blogs

My name is Andriy. I'm currently building a company called Nomic with the goal of making AI systems more accessible and explainable to all humans.

Before Nomic I was a machine learning Ph.D. student at NYU Courant focused on interpretability of deep nets. Teaching computers do things that only humans should be able to do has dominated my mind-space since about 2017.

The set of topics I am easily distracted by if brought up in conversation include: self-supervised representation learning, pre-transformers NLP, clinical informatics, long-winded discussions about data cleaning, building infrastructure for software that involves ML models and why you should care about helping fight for democracy in Ukraine.

Projects I am Proud of

	Nomic Embed: Training a Reproducible Long Context Text Embedder Zach Nussbaum, John X. Morris, Brandon Duderstadt, Andriy Mulyar technical report [code] [report]
	MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning. Andriy Mulyar, Ozlem Uzuner, Bridget McInnes Journal of the American Medical Informatics Association [code] [paper]
	Clinical Concept Linking with Contextualized Neural Representations. Elliot Schumacher, Andriy Mulyar, Mark Dredze ACL 2020 [paper]
	Phenotyping of clinical notes with improved document classification models using contextualized neural language models. Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, Mark Dredze NeurIPS 2019 ML4H Workshop Extended Abstract [code] [paper] [blog]

Updates

Things I've been up to.

April 2023	Raised 17M in venture funding to continue building Nomic.
April 2022	Left my Ph.D. to work to co-found Nomic - a company dedicated to building explainable and accessible AI systems.
September 2021	Started a Ph.D. in Machine Learning at NYU Courant.
September 2020	Worked as a machine learning engineer at Rad AI digging through millions of radiology reports and training LLMs on them.
June 2020	Software Engineering (ML/NLP) intern at Costar Group.
February 2020	CRA published a write-up about me.
December 2019	Attended NeurIPS 2019 and presented a poster at ML4Health!
November 2019	Presented posters at the N2C2 2019 workshop (AMIA) and VCU Undergraduate Research Symposium.
Summer 2019	Internship at Johns Hopkins CLSP supervised by Mark Dredze.
Spring 2019	Led a team of undergraduate students to improve software experience of medaCy. Supervised by Bridget McInnes.
Fall 2018	Co-developed a software framework, medaCy , for building and sharing statistical information extraction models. Supervised by Bridget McInnes.
Summer 2018	VCU DURI Fellowship supervised by Bartosz Krawczyk to study theoretical properties of decision trees.

Inactive Projects

Things i've done.

	Multitasking Transformers Training Transformers to perform multiple tasks with the same set of representations.	January 2020
	Multi-label Document Classification with BERT Language model powered long document classification architectures. (NeurIPS ML4Health 2019).	September 2019
	Clinical Semantic Similarity Training language models (BERT) towards associating semantic equivalence in clinical notes.	July 2019
	medaCy: Medical Text Mining and NLP Framework medaCy is a highly predictive text processing and NLP research framework built over spaCy that leverages cutting-edge tools for mining medical text.	August 2018
	Clinical Concept Normalization and Extraction Applying neural ranking to map unstructured text in clinical notes and electronic health records to structured medical ontologies. Work accepted at ACL 2020.	June 2019
	Automatic Graph Conjecturing A service auto-conjecturing over graphs to empirically discover novel relations between graph theoretic properties and invariants. A project under Dr. Craig Larson.	May 2019
	Decision Trees: Exploiting Local Data Properties and Nested Ensembles Trees are excellent learners: simplistic, interpretable and versatile. This project explores their interaction with local data characteristics to improve predictive performance and interpretability.	March 2018
	Gateway Math A software for mathematics educators to generate dynamic worksheets.	January 2017
	Reproducible Machine Learning Effective methods to maintain replicable and reproducible research environments in computational science domains.	November 2018