Projectscomputer icon

My more recent writing and projects can be found on Art Fish Intelligence.


LLM Tokenization

All languages are NOT created (tokenized) equal: Language models cost much more in some languages than others

Book Bans and Censorship in the United States

Currently, a vast diversity of stories being banned in the US: stories of LGBTQ+ communities, Muslim families, and women in science. LLMs such as GPT-3 refused to recommend banning books outright, for any age level.

DALLE Red-Teaming

Part of the team of AI Researchers to probe ('red team') OpenAI's DALLE-2 prior to its public release to detect potential harms, biases, and disinformation.

Scaling Radio Analysis with Data Science for Infodemic Monitoring

Evaluated and analyzed COVID-19 vaccine discourse on public radio transcriptions for public health monitoring. Master's Thesis with the Oxford Internet Institute and United Nations Global Pulse.

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

Probed GPT-2 with prefix templates related to gender and occupation to evaluate biases in its predictions, which were compared wtih ground-truth US labor data.

Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

Evaluated Facebook's Hateful Meme Challenge by comparing Facebook's carefully synthesized dataset with a collection of 'memes in the wild' gathered from Pinterest.

Covid Texting Service

Created texting service for answering questions related to the pandemic and providing COVID-19 statistics to those without access to Internet. This is a working project with Silicon Harlem in NYC to get the service in the hands of people in need.

Data Surveillance and Biocitizenship in the COVID-19 Pandemic: Digital Contact-tracing in South Korea, Hong Kong, Singapore, and Taiwan

Analyzed the privacy implications of digital contact tracing during the early days of the COVID-19 pandemic through topic modeling and semantic network analysis of news media from 6 countries and 3 languages.

Big Data as Historical Archive: The Challenges of Preserving Today’s Digital Artifacts

Examined the greatest challenges for long-term preservation of big data, challenges which differ from the preservation of mostly static, smaller-scale digital material which had concerned archivists in the past. With Seoul National University Big Data Studies Lab.

Joseon Munkwa Project

Conducted named-entity recognition and disambiguation on historical figures from Korean Joseon-Dynasty civil service roster data. With Seoul National University Big Data Studies Lab.

Virtual Coffeeshop

Created virtual coffeeshop experience for those of us stuck at home during stay-at-home and seeking the vibe and comradeship of a cafe.

Music Factorization

Factorized a scale into parts that can be understood using combinations of 'symmetric' scales. An exercise in breaking down all scales into a combination of whole tone scales.