Recently, I started experimenting with a very simple idea:
Parents use WeChat to start a video call
↓
A child’s tablet receives the incoming call
↓
Both sides enter a video conversation
At first glance, it sounds like a normal video chat application.
But once I actually started building it, I realized the project was less about “making a video chat page” and more about:
connecting the WeChat ecosystem,
PWA behavior,
real-time communication,
and Android tablet lifecycle management
into a coherent system.
And eventually, the whole MVP became an experiment in:
SDK orchestration
+
AI-assisted development
+
Architecture-first Vibe Coding
Why Build This?
The original motivation was surprisingly simple.
I noticed that many “family companionship” products suffer from the same problem:
too complicated for parents,
too heavy for children.
Most communication products assume:
- both users install apps
- both users register accounts
- both users learn the interface
- both users manage permissions
But in reality:
most parents already live inside WeChat.
So the question became:
Could parents use only WeChat,
while children use only a tablet,
and still achieve a smooth video calling experience?
That became the core MVP goal.
The Initial Architecture
The first system design looked like this:
WeChat Side
↓
Backend Service
↓
Tencent Cloud IM / TRTC
↓
Child Tablet
Each layer had a different responsibility:
WeChat:
entry point
IM:
messaging and signaling
TRTC:
real-time audio/video
Tablet:
incoming call UI and video display
And this led to one important realization very early on:
IM is not the same thing as video calling.
One Important Misunderstanding
Initially, I assumed:
Tencent Cloud IM SDK
≈ video communication SDK
But after reading the documentation more carefully, the architecture became much clearer.
Tencent Cloud actually separates the system into two layers:
Tencent Cloud IM
Responsible for:
- messaging
- online status
- signaling
- session management
- push notifications
Meanwhile, the actual media layer comes from:
TUICallKit / TRTC
Tencent Cloud TUIKit Documentation
Responsible for:
- WebRTC
- audio/video streams
- room management
- microphone
- camera
At that point, the mental model finally clicked:
IM is the “doorbell”
TRTC is the “conversation”
Once that distinction became clear, the overall architecture became dramatically simpler.
Why I Chose PWA Instead of Native Apps
Initially, I considered:
Flutter
React Native
Native Android
But later I realized:
the biggest complexity during MVP development
was not UI development.
The difficult parts were actually:
- WeChat ecosystem integration
- signaling flow
- Android lifecycle behavior
- push notifications
- permissions
- WebRTC compatibility
So eventually I chose:
PWA + APK wrapping
The reason was simple:
iteration speed.
The tablet side only needed:
- fullscreen UI
- microphone
- camera
- notifications
- WebRTC support
Modern Android Chromium environments already support most of these capabilities well enough for an MVP.
The deployment path eventually became:
PWA
↓
TWA / WebView
↓
Android APK
This dramatically reduced iteration overhead.
How the System Actually Works
The final MVP flow became:
Parent presses “Call Child”
↓
WeChat sends request
↓
Backend creates room
↓
IM sends signaling event
↓
Child tablet receives incoming call
↓
Both clients join TRTC room
↓
Video call starts
The backend mainly handles:
UserSig generation
device binding
room management
permission validation
message routing
Meanwhile, the tablet side handles:
incoming call UI
camera
microphone
video rendering
The Hardest Part Wasn’t the UI
One thing became very obvious during development:
the hardest part was not drawing interfaces.
The difficult part was:
Android background behavior
PWA lifecycle limitations
WebRTC compatibility
notification reliability
permission handling
For example:
Can a PWA reliably wake up for incoming calls?
The answer turned out to be:
partially.
Different Android vendors behave differently:
- different WebView versions
- different Chromium implementations
- different battery policies
- different background restrictions
Eventually, it became clear that:
pure PWA architecture was not stable enough.
So the system gradually evolved toward:
PWA
+
APK wrapping
+
native notification bridge
At that point, the project stopped feeling like “web development” and started feeling more like:
cross-system orchestration.
Codex and Architecture-First Vibe Coding
Another interesting part of this project was the development workflow itself.
Instead of building everything manually from scratch, I experimented with what I’d call:
Architecture-first Vibe Coding
The workflow looked something like this:
Read Tencent Cloud documentation
↓
Describe the system goal to Codex
↓
Generate architecture ideas
↓
Analyze SDK integration paths
↓
Generate PWA structure
↓
Debug WebRTC behavior
↓
Adjust Android lifecycle handling
↓
Iterate continuously
During this process, I realized something important:
AI’s biggest value was not “writing code.”
Its biggest value was:
accelerating architectural exploration.
Most real problems were actually about:
- whether the system design made sense
- whether SDKs could cooperate
- whether lifecycle assumptions were valid
- whether ecosystem limitations existed
—not about syntax itself.
What the MVP Finally Achieved
At its current stage, the MVP can already:
- launch calls from WeChat
- notify the child tablet
- establish real-time video communication
- synchronize signaling through IM
- run on Android tablets
- operate through a simplified interaction flow
Most importantly:
it finally started feeling like a product,
instead of a collection of SDK demos.
Final Thoughts
This project gradually changed the way I think about modern software development.
What became interesting was no longer:
“Can AI write code?”
The more interesting question became:
“Can AI accelerate system-level reasoning?”
Because increasingly, development feels less like:
typing individual functions
and more like:
organizing ecosystems,
SDKs,
AI systems,
and human intent
into a coherent structure.
And perhaps that is becoming one of the most important skills in the AI era.