| Title | Authors |
| Highlight: PocketNeRF: Fast-Converging Neural Radiance Fields for Indoor Reconstruction from Few-Shot Mobile Images |
Aaron Jin, Ryan Joonwon Suh, Lucas Emmanuel Brennan-Almaraz |
| Highlight: CURVLM: Circuit Understanding Via Group Relative Policy Optimization (GRPO) on Vision Language Models |
Diego Porras, Mike Zhao, Adhi Daiv |
| Highlight: Bridging Vision and Language for Sign Language Understanding |
Ali Sartaz Khan, Dat Tran, Sufen Fong |
| Highlight: Video Style Transfer with Reinforcement Learning |
Amelia Kuang, Coco Xu, Ariel Chen |
| Highlight: 3D Semantic Segmentation with 3D LiDAR Point Clouds and 2D Camera Images for Autonomous Driving |
Anze Liu |
| Highlight: Two-Step Deep Learning Approach for Classifying Bridges using Street Imagery |
Carmen Edna Andrade von Hillebrandt |
| Highlight: Improved Mineral Detection via Spectral Attention U-Net with a Novel Hapke Layer: Accelerating Discovery of Critical Minerals for the Green Transition |
Ryan Wang, Chandra Suda |
| Highlight: Fast Acoustic Wave Simulation with Neural Operators |
Fangjun Zhou, Kevin Liu, Hong Nhu Ngoc Vo |
| Highlight: Total Variation Loss for Compact Objects’ Segmentation Applications to Satellite Images |
Giacomo Fraccaroli |
| Highlight: Replicate FaceApp Effect and Enable Real-time Performance Based on GANs |
Jacky Yin |
| Highlight: Enhancing Segment Anything Model (SAM) for Brain Tumor Image Segmentation |
June Zheng, Yi Jing, Komei Ryu |
| Highlight: Adaptive Contrastive Masked Autoencoders for Structured Representation Learning |
Karan Singh |
| Highlight: Chess Position (FEN) generation using Chessboard and Piece Recognition |
Matt S Yang, Satya Prakash Biswal, Samruddhi Yashwant Kahu |
| Highlight: Rock Image Super-resolution: From CT to micro-CT |
Zitong Huang, Minghui Xu |
| Highlight: Deep Learning based Segmentation of Focal Adhesions from Immunofluorescence Microscopy Images |
Xingyuan Zhang |
| Blurred Lines: Automated Video Obfuscation With Computer Vision |
James Ashton Varah, Tobias Charles Moser |
| PalmPilot: Drone Control using Live Hand Signal Detection |
Chris Kay, Carlos Daniel Joseph Hernandez, Matt Mahowald |
| Beyond Janus: Enhancing 3D Consistency in Text-to-3D Generation Through Foundation Model Reasoning |
J Yim, Soumyadeep Bhattacharjee, Antonio Llano |
| AstroDINO: Self-Supervised Learning on Astronomical Images |
Sagar Kapare |
| Video Understanding by A Sequence Model: A Case Study on Golf Swing Sequencing |
Yanming Zhu |
| Skin Cancer Detection from Smartphone Camera Images via Transfer Learning |
Darren Chan, Jack Zhang, Flora Fenghua Yuan |
| Dynamic Sparse Voxel Attention for Efficient Transformers |
Veljko Skarich, Manel Bermad |
| Exploring Lora Merging Techniques in Virtual Try-On |
Steven Songqi Pu, Joey McCoy |
| Finding the Fit: Vision-Language Models for Clothing Retrieval |
Henry Payne Palmer, Upamanyu Dass-Vattam |
| Finetuning Pretrained Models for Compressed Dermatology Image Analysis |
Eric Cui, Sonnet H Xu |
| Investigating the Functional Role of Learned Channel Suppression in Diffusion VAEs |
Oleg Roshka, Eugenie Shi, Herry Wang |
| Reviving an Endangered Script: Optical Character Recognition for Syriac |
Erick Angelo Ramirez |
| Comparing Vision Generative Models on Talking-Head Synthesis |
Songyu Han, Mustafa Abdelrahim Haroun Fadl, Hannah Rachel Levin |
| Exploring the Integration of Gaussian Splatting in Formula 1 Racing |
Colette Bao-Quan Do, Harviel Kyle Arcilla Arcilla |
| GeoVision: Fine-Grained Urban Geolocation in San Francisco via Distribution-Aware Visual Models |
Mathijs Ammerlaan, Nils Frederik Kuhn, Raul Molina Gomez |
| Computer Vision for Financial Statement Analysis |
Ryan Samuel Samadi |
| Obstacle Detection for Autonomous Driving Using Semantic, Instance, and Transformer Models on the Waymo Open Dataset |
Carlos Henrique Aranguren |
| Improving 3D Scene Segmentation with Prior 3D Object Knowledge |
Laszlo Szilagyi |
| Beyond ViT: Accelerating ASL Recognition with Convolution-Attention Fusion |
Hengjin Tan, Kai Liu, Jingshu Liu |
| Image Segmentation For Wildfire Prediction |
Nevin George, Anthony Maltsev |
| OmniLoc: Towards Leveraging Multiple Perspectives for Probabilistic Visual Geolocalization |
Xiyan Shao, Charlie John Haywood |
| StoryCrafter: Comic-Style Storyboarding Meets 3D Camera Animation |
Alex Gu, Isaac Hongzhe Kan, Jack P Le |
| Understanding and comparing learned features in CLIP models via Sparse Autoencoders |
Logan Ivins Graves, Franklin Sheng Zhu |
| Predicting MLB Pitch Outcomes From Video Data |
Ohm Patel, Ishan Chirag Mehta |
| WEBSIGHT: A Vision-First Architecture for Robust Web Agents |
Tanvir Bhathal, Asanshay Gupta |
| Eye in the Sky: Live Blackjack Card Counting via Real-Time Video Analysis |
Mini Rawat, Cameron James Heskett, Ankur Jai Sood |
| 3D Model Reconstruction from Image(s) |
Zhen Lin |
| CS231n Project: Particle Identification and Energy Reconstruction in Calorimeter Readout Using dSiPM |
Bo Liu, Qihua Wang, Liangyu Wu |
| A Computer Vision Approach to Monitoring beehive Activity |
Maxime de Belloy |
| Diffusion-Based Super-Resolution of Micro–CT to SEM for Porous Media |
Lisa Li |
| Gaming the Video Against Yourself |
Joshua Boisvert |
| Learning Stethoscope Placement for Heartbeat Detection Using Convolutional Neural Networks |
Bryan Jang Yit Tiang, Cristian Gabriel Galdeano |
| ProtMAE: Masked Autoencoding of Protein Distance Maps for Structure-Aware Representation Learning |
Emi Maria Mathew, Aya Aburous, Jad Bitar |
| Modeling Urban Food Insecurity with Google Street View Images |
David Li |
| A Multi-Modality End-to-End Autonomous Driving System |
Bryan Alexis Pineda, Ziyu Li, Kamal Mohammed ElMallah |
| Image Label Disambiguation for Rich Semantic Representations in Language-Prompted Segmentation Models |
Justin Anthony Hall, Walter Lopez Chavez |
| Interpretable Multimodal Deep Learning Model on MIMIC-CXR Dataset |
Zhenghui Chen |
| Exploring the Effects of Contrastive Pretraining and Test-Time Post-training on Semantic Segmentation of Waste |
Andy Ouyang, Nishikar Paruchuri, James Cheng |
| Leveraging Transfer Learning with Swin Transformer to Identify Coronary Artery Disease using Cardiac MRI |
Natasha Banga |
| ScreenShield: Computer Monitor Tracking and Blurring for Video |
Aniket Mahajan, Niall Thomas Kehoe |
| Fluorescent Neuronal Cell Counting using a modified ResUnet Model with Attention Gates |
Aanya Tashfeen |
| GeorAIn: Demystifying GeoGuessr |
Fayez Navid Anwar |
| Investigating Explanation Stability under Distribution Shifts |
Laura Paola Gomezjurado Gonzalez |
| Spatial Spectral Deep Learning for Tumor Detection in Colorectal Cancer |
Nikhiya Shamsher |
| Generalizing Discretization Representations for the Physical Solutions via Flexible Spatial Image Data Structures |
Zi Wang |
| Breaking Bots: Enhancing CAPTCHA Design with Neural Style Transfer |
Rinnara Sangpisit |
| Bridged Clustering for Computer Vision |
Ellie Tanimura, Pierre R Labroche |
| Towards A Unified Deep Learning Architecture for Extraterrestrial Surface Perception |
Brian Y Wu |
| Parameter Estimation of Digital Audio Effects from Spectrograms |
Ethan L Buck, Wesley Kavin Larlarb, Richard Thompson Lee |
| Evaluating a residual learning framework for 3D computed tomography data |
Evan Vincent Maestri |
| Hoops Radar: Player Tracking with NBA Broadcast footage |
Tomas Coghlan |
| Diffusion Board: An End to End Chess Move Prediction Pipeline Leveraging Discrete Diffusion for Long Horizon Prediction |
Prerit Choudhary, Akshay Gupta |
| Predicting Image Geolocation Using Feature-Based Fine-Tuning |
Kai Qi Wu |
| From Mode Collapse to State-of-the-Art: Engineering Robust Vision-Based 3D Hand-Object Manipulation Understanding |
Bryan Dong, Howard Ji |
| Cap4Art: Improving Image Captioning Capabilities Through Multi-Task Learning |
Sunny Yu, Willy Chan, Xiaofei Yan |
| Visual Age Estimation of Infant Photographs Using Deep Neural Networks |
Marcelo Bernardo Fernandez, Edgar Omar Leon |
| VAE Matters: Latent Compression Choices for DiT Architectures |
Sherry Xie, Eric Liu, Artur Barbosa Carneiro |
| Parameter-Efficient Fine-Tuning of BiomedCLIP for Diabetic Retinopathy Detection |
Kevina Wang, Adi Badlani |
| Trash Into Treasure: Classifying Garbage from Drone Imagery Using Image Classification Algorithms |
Alyssa Fong |
| Egocentric RGB-D Perception for High-Level Locomotion Planning in Humanoid Robots |
Tae Yang |
| AI-based Acoustic Defect Detection for Speaker Manufacturing |
Yitong Lu |
| You Only Dive Once: Real-Time Pose Scoring in Competitive Diving |
Agnes Liang, Renee Zbizika, Yoshi George Nakachi |
| Distributed 3D Reconstruction of Aerial Footage |
Igor Barakaiev |
| Glimpse Attention Models |
Victor Ng |
| report |
Harshvardhan Singh |
| Real-Time Video Segmentation for Autonomous Robotic Manipulation |
Chetan Reddy Narayanaswamy, Vakula Venkatesh |
| Video-Based Prediction of VO |
Gustavo D Martinez |
| Lightweight Model Adaptation for Mitigating Bias in Deep Learning Models for Chest X-Ray Analysis |
Clemence Marie Mottez |
| Using Transfer Learning to Adapt MobileNet for General Plant Disease Detection on Irregular Images |
Medhya Goel |
| Fair Enough to Diagnose: Reducing Gender Bias in Pneumonia Detection with Swin Transformers |
Yiting Shen, Dante Serafino Koffler |
| Enabling Rapid Disaster Response: Multimodal Remote Sensing Coregistration |
Evan John Twarog |
| Improving Adversarial Robustness of Image Classification Through Pretraining on Neural Data |
Shenghua Liu |
| ChessMates: How good are VLLMs at Chess? |
Rahul Chand |
| Video Caption Generation |
Ting Fu |
| Real2Code2Real: Articulated Full-Scene Reconstruction with 3D Asset Generation |
Eric Liang, Jacob Nathan Goldberg |
| Posterior Sampling using Diffusion Models for HDR Reconstruction |
Jamin Jia-Ming Xie |
| AI-Powered Dance Coaching via Pose Estimation, Vision Transformers and Dynamic Time Warping |
Arnold Tianyi Yang, Henry Jingsong Zhou, Roshen Sanjay Nair |
| Detecting Abnormalities in Musculoskeletal X-Rays: Project Milestone |
Adisa Kruayatidee |
| Chess Position (FEN) generation using Chessboard and Piece Recognition |
Matt S Yang, Satya Prakash Biswal, Samruddhi Yashwant Kahu |
| Nonrigid Motion Correction in MRI Using Neural Space-Time Modelling |
Jaehyeok Bae, Aizada Nurdinova, Yimeng Lin |
| Single-view 3D Human Reconstruction Using Generative Prior |
Zhengmao Liu |
| Addressing Class Imbalance in Deepfake Detection through ResNet-50 Ensemble with Specialist Models and Threshold Optimization |
Jiheng Zhang, Victor Chen, Madhuhaas Gottimukkala |
| Context-Aware Augmentation for Semantic Segmentation in Low-Data Regimes |
Ariel Tian Wang |
| Modeling the Margins: Edge-Aware E2E Driving |
Kevin James Selig |
| Chair Generation Model (CGM): Utilizing Fine-tuned and Multi-view diffusion with Shape Generation for text-to-3D Chair Model Generation |
Gabriel K Bo, Ian Yue-Ran Chen, Marc Bernardino |
| Language-Driven Primitive-Based 3D Scene Generation with Infinigen |
Eyrin Kim, Michelle Borg Yan Lau |
| Building An AI-Powered Fashion Application: Virtual Clothing Try-On |
Shawn Zhang |
| Robust Depth Estimation in Adverse Visual Conditions |
Wenfu Lei, Jiamin Sun |
| Gradient-Based Image and Protein Generation |
Jun Woo Kim |
| CS231N Project: Organic Waste Quantification in Public Trash Bins in Urban Areas Using Thermal Videos |
Varun Sahay, Pin Li, Seoyoung Oh |
| Transfer Learning Under the Surface: Explainable Coral Bleaching Classification Across Datasets |
Samantha Estrada |
| FloodscapeDiffuser: Low-Rank Conditioning for Diffusion-Based Post-Flood Satellite Imagery Simulation |
Martin Scott Pollack, Khadijah Anwar, Tony Yu |
| Learning 3D Structure in Irradiated Lithium Fluoride via Masked Autoencoders |
Piper Fleming, Carolyn Hellerqvist Smith |
| Multi-Modal Large Language Models for Historical Handwritten Text Recognition (HTR) and Data Augmentation |
Yuanhao Zou |
| Event Retrieval for Driving Scenarios Highlighting |
Bingqing Zu, John Ren |
| PrivacyGuard: Real-Time Detection and Redaction of Sensitive Visual Information |
Mutyala Naidu Kannuru |
| Deep Learning to Predict Lithium-Ion Battery State-of-Health from Partial Discharge Data: Comparing 1-D Temporal Models and Novel Convolutional Curve-Image CNNs |
Steven D. Liu |
| Learning Predictive Candlestick Patterns: Vision Transformers for Technical Analysis |
Arnav Gupta |
| Agentic Retrieval and Editing System for Image Generation |
Berwyn Berwyn |
| Diagnose and Defend: Lightweight Behavior-Aware Attention Gating for Robust Vision Transformers |
Natalie Si-Chi Kuo, Yanny Gao, Sara Kothari |
| Exploration of Visual Speech Recognition with LipNet |
Manan Sheth |
| What Do You See In a Poem? Image Generation from English Romantic Poetry |
Md Ahsanur Rashid, Shaoxiong Zhang, Funing Yang |
| Slippify: Parsing Super Smash Bros. Melee Frames |
Matthew George Lee, William Hu, Samuel A Do |
| NIGnets and Neural ODEs for Representing Non-Self-Intersecting Geometry |
Atharva Aalok |
| From Lyrics to Visuals: A Conditional GAN Framework for Album Cover Generation |
Laura Wu, Juliana Ma |
| Full-Page Chinese Calligraphy Generation via LoRA Fine-Tuning of Stable Diffusion |
Huici Pan, Jieshu Huang, Zhiyin Pan |
| Fast Inference for Vision-Language Model Image Captioning |
Taeuk Kang, Andrew C Shi, Nash Brown |
|
Jack Gross-Whitaker, Dev Narasimhan Gopal |
| Real-Time American Sign Language (ASL) Recognition with Visual and Pose-Based Classification |
Elisabeth A Holm, Armando Alejandro Borda |
| Accelerated MRI Reconstruction with SwinUNet: Enhancing Image Quality through Transformer-Based Architecture |
Olufeolu Oluwapelumi Kolawole, Yogesh Seenichamy-Venkatesan, Kesavan Ramakrishnan |
| Diffusion-Guided Gaussian Splatting for Autonomous Driving |
Tao Wang, Maxton Huff |
| Investigate Transfer Learning For Pre-Trained Visual Foundation Encoder on Robot Manipulation Policy |
Yu Chi Hsu, Yu Wei Lin, Wei-Lin Pai |
| High-Fidelity Traffic Simulation with Camera Embeddings |
Yina Jian, Jerry Gu, Ryan Zhijie Rong |
| BenchPRISM: Benchmarking Physical Relationship Understanding In Segmentation Models |
Leo Li, Tahmid Jamal |
| ByeBye: A Zero-Shot Human Removal and Replacement Pipeline with Stylized Character Insertion |
Ashwin Mahendran, Arihan Varanasi, Caleb Youngjae Whang Choe |
| Evaluating SubCell Foundation Vision Transformer on Yeast Cell-Cycle and Protein Localization Tasks |
Mihajlo Stojkovic |
| Skin Cancer Detection with Deep Learning |
Keyan Azbijari |
| Do Hero Images Perpetuate Gender Bias? |
Anika Fuloria |
| Self-supervised Denoising Techniques for Diffusion Tensor Imaging |
Irmak Sivgin, Kamyar Rajabali Fardi |
| Guided by Style: Fine-Grained Modulation in Multi-Style Artistic Transfer |
Christina Ba, Catherine M Zhang |
| Weakly Supervised Learning via Relational Comparisons |
Junha Lee, Sina Mollaei |
| Lightweight 3D Inpainting for Cultural Heritage Restoration Using Diffusion Models |
Aarya Sumuk |
| Deep Learning-Based Pose Estimation and Boundary-Aware Mouse Brain Slice Registration for ABBA |
Cherry Chen |
| Multi-Agent Deep Learning for Visual T Cell Behavioral Modeling |
Joseph Li, Sean Tsung, Adrian Sadik Molofsky |
| Training CoCo: Continuity and Consistency in Subject-Driven Diffusion Models |
Eric Lee |
| Rock Image Super-resolution: From CT to micro-CT |
Zitong Huang, Minghui Xu |
| Why Are CNNs The Model of Choice for Simulated Robotic Picking? |
Doug Ian Fulop, Olivia Kelly Taylor, Josh James Citron |
| Bridging the Reality Gap: Synthetic Data Generation for Food Portion Estimation |
Ben Shlomo Gur |
| Structured Radiology Report Summarization with Fine-tuned BLIP-2 |
Nahome Gebremariam Hagos |
| Enhancing Visual Question Answering for Smart Glasses Using Vision-Language Models |
Xinxi Chen, Tianyang Chen |
| DETRmining the Cosmos: A Transformer-Based Approach to Galaxy Morphology Detection |
Kumar Chandra, Renn Su, Max Luis Rodriguez |
| Indoor Scene Understanding via 2D and 3D Semantic Segmentation: Integrating Depth for Geometry-Aware Reconstruction |
Karthik Pythireddi |
| Engagement-weighted and style-aware scoring for fashion compatibility |
Melissa H Liu, Sally Lee |
| 3D Human Hand Reconstruction Using Gaussian Splatting with Deep Implicit Anatomical Shape Priors |
Jonathan Hui Wen, Elijah Song, Allen Kiriroath Chau |
| DashGuard: Hierarchical Attention for Dashcam Video Accident Detection |
Kory Zifeng Yang, Luca Mondonico |
| Waste Classification and Management Using Computer Vision |
Annie Fan, Jason Sun, Sue Deng |
| Visual scrolling detection to enhance GUI agent training |
Chena Lee |
| Parking Spot Detection Using Deep Learning Computer Vision Applied to Satellite Imagery: Applications for Solar Carport Potential Estimation |
Renee Duarte White, Peiyu Li, Josh Chad Neutel |
| Laparoscopic Surgical Image Segmentation - CS231N Final Project Report |
Brian Jonathan Sutjiadi |
| L |
Onyinyechi Nichole Okoye |
| STAGED: Spatio-temporal Tracking and Analysis for Ground-level Event Detection |
Joshua Logan Shunk |
| JetVision-Mamba: Selective State Space Models for Jet Classification in High Energy Physics |
Dimitris Ntounis |
| The Not-So-Secret Life of Dogs |
Bea Lai Kuan Lim |
| Leveraging Captions for Context-Aware Image Colorization |
Sheena Lai, Haoming Song |
| A.I.R.G.T.R. – Artificial Intelligence for Real-time Gesture-based Tonal Rendering |
Jacob Alan Rubenstein, Shane Robinson Mion |
| Ultimate Vision: A System to Autonomously Track An Ultimate Frisbee in Video Frames |
Mallika Parulekar, Yash Suvidh Kankariya |
| Aligning Text-to-Image Diffusion Models using Human Utility Optimization and Low-Rank Adaptation |
Yiwen Zhang, Wendy Yin, Yicheng Zhang |
| Enhancing Bearing Quality Control: A CNN-Based Approach for bearing defect classification. |
Mengyuan Huang |
| MLLM-Driven Highlight Reel Generation for Ultimate Frisbee Games |
Heather Szczesniak, Megan Ja, Farah Shahbaz |
| Benchmarking ML-Based Antarctic Sea Ice Forecasting in a Data-Rich Setting |
Yuchen Li |
| Segmenting the Earth: Challenges in Land Cover Classification |
Shirley Cheng |
| Deception classification from video input |
Stephanie Stephanie Vezich Tamayo, Lillian Lillian Ma |
| Fail Fast, Run Faster: Shape Safe Deep Learning in Rust on Apple Silicon |
Jai Krishna Agrawal, Taylor Dosia Tam |
| This is The Way: Vision-Based End-to-End Planning for Autonomous Driving |
Arpit Dwivedi, Purushotham Mani, Anishalakshmi Venkata Palaparthi |
| CS231N Final Project |
Baptiste Brugerolle, Nael Ghoundale, Erika MacDonald |
| FDSA-GAN: A Frequency-Domain Self-Attention GAN For Improved Line Art Generation Of Anime Faces |
Richard Wu |
| A Systematic Evaluation of Independent Strategies for Enhancing Text-to-Image Semantic Alignment in Stable Diffusion |
Xinxie Wu |
| Exploring Dance Expression Through Self-Supervised Transformer-Based Contrastive Representation Learning |
Samuel Alexander Tong |
| DeepSneak: Deepfake Video Detection |
Christine Tung, David Hung Tung |
| Detecting Pedestrian Hazards on Urban Sidewalks in Low-Visibility Conditions |
Adrian Adesola Adegbesan, Sathvik Nori |
| Story Augmentation with Generative AI (SAGANets): Investigating Multi-Image Story-Generation Pipelines |
Bradley Konane Moon, Connor William Janowiak, Sade U Ried |
| Machine Vision Based Scoring of Coronary Calcium |
Ibrahim Kecoglu |
| CustomFX: A Lightweight Hand Tracking Model for Musical Instruments |
Gayatridevi Dinar Kamat Tarcar, Sid Yu, Owen Jung |
| Toward Accessible, Lightweight, At Home Dermatological Screening |
Nikhil R Lyles |
| Zero-Shot vs. Few-Shot CLIPSeg: Efficient Urban Feature Segmentation |
Yun-Dam Ko |