Thinking...
Hardships cannot steal lofty ambitions
Who is Arno?

Hi there, this is Arno. I earned my bachelor's degree in Statistics from Michigan State University and completed my master's degree in Data Science (Computational Linguistics) at the University of British Columbia.
Multidisciplinary AI practitioner with a strong foundation in natural language processing, machine learning, and statistical analysis, backed by hands-on experience in cloud infrastructure, Transformer-based modeling, and full-stack system deployment. Proficient in applying advanced NLP techniques—including context-aware retrieval, sentiment analysis, and domain-specific generation—to real-world problems in healthcare, legal, and e-commerce domains. Skilled in developing robust pipelines using Python, PyTorch, Hugging Face Transformers, and LangChain, and deploying scalable solutions via FastAPI, Docker, and Nginx. Demonstrated success in competitive environments, including a Top 10 finish in Alibaba Tianchi CCL2025 ICD Diagnosis Coding Challenge, and real-world productization through a legal RAG system for Canadian Supreme Court criminal cases. Experienced in translating complex models into actionable business insights, with a proven ability to deliver end-to-end solutions across data, model, and interface layers.
The road ahead remains full of uncertainty and challenges, yet I continue to forge ahead on this rugged and fascinating path. May we encourage each other along the way — may we face the thorns without fear, hold fast to our dreams, and keep moving forward.
-
Python / R
Coding Skills -
AI
Ability to use modern AI tools -
Machine Learning
Machine Learning Skills -
Driven
A restless spirit that refuses to settle -
Deep Learning
Deep Learning Skills -
World
The ability to stay mediocre -
NLP
Natural Language Processing Skills -
MoePower
The ability to embody moe energy
Arno's Projects
SCC Criminal Cases RAG
This is a specialized retrieval-augmented generation platform that enables natural language queries about Canadian Supreme Court criminal cases. It combines web scraping, vector search, and language models to retrieve and generate accurate legal information with relevant citations. Built with FastAPI, React, and ChromaDB, the system is containerized for multi-architecture deployment (x86/AMD64 and ARM64).
Alibaba Tianchi: ICD Prediction
Achieved Top 5% ranking (9th/174) in the final leaderboard, outperforming most academic and industrial baselines. Developed a multi-task medical coding model using Transformer-based architecture to automatically assign primary and secondary ICD diagnosis codes from unstructured Chinese EMRs. Built a RoBERTa-based encoder with 11 parallel input branches, extracting [CLS] token reps for structured semantic parsing.
Kaggle: House Prices
Achieved Top 5% ranking (246/5,164) on the final leaderboard. Predicted house prices using over 3,000 records and 79 features such as lot size, building year, condition, and geographic location. Then implemented full machine learning pipelines in both Python and R, covering data processing, exploratory data analysis (EDA), feature engineering, model training, and prediction.