Post

Replies

Boosts

Views

Activity

Reply to Proposal: Using ARKit Body Tracking & LiDAR for Sign Language Education (Real-time Feedback)
Update: Refining the Architecture with LLMs (Gloss-to-Text) I've been refining the concept to make development faster and less data-dependent. Instead of trying to solve "Continuous Sign Language Recognition" purely through computer vision (which is extremely hard), we can split the workload. The Hybrid Pipeline Proposal: Vision Layer (ARKit): Focus strictly on Isolated Sign Recognition. The CoreML model only needs to identify individual signs (Glosses) based on the skeleton data. It treats gestures as "Tokens". Input: Skeleton movement. Output: Raw tokens like [I], [WANT], [WATER], [PLEASE]. Logic Layer (LLM): We feed these raw tokens into an On-Device LLM (or API). Since LLMs excel at context and syntax, the model reconstructs the sentence based on the tokens. Input: [I] [WANT] [WATER] [PLEASE] Output: "I would like some water, please." Why this is faster to build: We don't need a dataset of millions of complex sentences to train the Vision Model. We only need a dictionary of isolated signs. The "grammar" part is offloaded to the LLM, which is already solved technology. This drastically lowers the barrier for creating a functional prototype.
1w