My name is Andriy Mulyar and I am a first year Ph.D student at NYU Courant. My research interests reside in the intersection of machine learning, statistical learning and natural language processing. I enjoy tackling interesting problems in representation learning, language, health and information extraction.

Update (05/2022): I have dropped out of my Ph.D. to found a start-up named Nomic. Nomic focuses on building tools that let humans interact with internet scale sized datasets (e.g. twitter, generated images) . My past six years in machine learning has convinced me that the AI systems of the 2020's will not be built because of better modeling but because of larger and cleaner datasets. Nomic is now the best use of my time.

Update (09/2021): I have ended an exciting year building generative AI products for radiology at Rad AI and have begun my Ph.D!


Selected Publications

MT-Clinical BERT: Scaling Clinical Information Extraction with Multitask Learning.
Andriy Mulyar, Ozlem Uzuner, Bridget McInnes
Journal of the American Medical Informatics Association
[code] [paper]
Clinical Concept Linking with Contextualized Neural Representations.
Elliot Schumacher, Andriy Mulyar, Mark Dredze
ACL 2020
[paper]
Phenotyping of clinical notes with improved document classification models using contextualized neural language models.
Andriy Mulyar, Elliot Schumacher, Masoud Rouhizadeh, Mark Dredze
NeurIPS 2019 ML4H Workshop Extended Abstract
[code] [paper] [blog]

Updates

Things I've been up to.
September 2021 Started a Ph.D in Machine Learning at NYU Courant.
September 2020 Machine Learning Engineer at Rad AI.
June 2020 Software Engineering (ML/NLP) intern at Costar Group.
February 2020 CRA published a write-up about me.
December 2019 Attended NeurIPS 2019 and presented a poster at ML4Health!
November 2019 Presented posters at the N2C2 2019 workshop (AMIA) and VCU Undergraduate Research Symposium.
Summer 2019 Internship at Johns Hopkins CLSP supervised by Mark Dredze.
Spring 2019 Led a team of undergraduate students to improve software experience of medaCy. Supervised by Bridget McInnes.
Fall 2018 Co-developed a software framework, medaCy , for building and sharing statistical information extraction models. Supervised by Bridget McInnes.
Summer 2018 VCU DURI Fellowship supervised by Bartosz Krawczyk to study theoretical properties of decision trees.

Inactive Projects

Things i've done.
Multitasking Transformers
Training Transformers to perform multiple tasks with the same set of representations.
January 2020
Multi-label Document Classification with BERT
Language model powered long document classification architectures. (NeurIPS ML4Health 2019).
September 2019
Clinical Semantic Similarity
Training language models (BERT) towards associating semantic equivalence in clinical notes.
July 2019
medaCy: Medical Text Mining and NLP Framework
medaCy is a highly predictive text processing and NLP research framework built over spaCy that leverages cutting-edge tools for mining medical text.
August 2018
Clinical Concept Normalization and Extraction
Applying neural ranking to map unstructured text in clinical notes and electronic health records to structured medical ontologies. Work accepted at ACL 2020.
June 2019
Automatic Graph Conjecturing
A service auto-conjecturing over graphs to empirically discover novel relations between graph theoretic properties and invariants. A project under Dr. Craig Larson.
May 2019
Decision Trees: Exploiting Local Data Properties and Nested Ensembles
Trees are excellent learners: simplistic, interpretable and versatile. This project explores their interaction with local data characteristics to improve predictive performance and interpretability.
March 2018
Gateway Math
A software for mathematics educators to generate dynamic worksheets.
January 2017
Reproducible Machine Learning
Effective methods to maintain replicable and reproducible research environments in computational science domains.
November 2018