Features
Complete feature list and capabilities
Complete Features
This document outlines the complete feature set for the robot, organized by functional category.
1. 🤖 Core Software & System
- [S-1] Boot-up Sequence: On power-on, the robot displays a boot-up animation on the OLED "eyes" and plays an optional startup chime.
- [S-2] Wi-Fi Manager: The robot can be configured to connect to a local Wi-Fi network with automatic reconnection if the connection is lost.
- [S-3] State Machine: A central software component (the "Emotion Engine") manages the robot's current state (e.g.,
IDLE,LISTENING,THINKING,RESPONDING,HAPPY). - [S-4] OTA Updates: Over-the-Air updates enable software updates for the ESP32-S3 over Wi-Fi without requiring a USB connection.
2. 🗣️ AI & Voice Interaction
- [AI-1] Wake-Word Detection: The robot continuously listens on-device for a specific wake-word (e.g., "Pico"). This is the only audio processing performed while idle.
- [AI-2] Speech-to-Text (STT): Upon detecting the wake-word, the robot records audio, sends it to a free-tier STT service, and converts the speech to text.
- [AI-3] Intent Recognition: The software parses the text to understand user commands, differentiating between:
- Q&A / Chat: (e.g., "What's 5×4?", "Who is...")
- Internal Command: (e.g., "Sing a song", "Parrot me")
- IoT Command: (e.g., "Turn on the light")
- Mode Change: (e.g., "Connect to ChatGPT")
- [AI-4] Cloud AI Integration: Ability to enter "ChatGPT Mode" where all subsequent voice queries are sent to a free-tier generative AI API (e.g., Google Gemini).
- [AI-5] Sound Bank Playback: The robot plays pre-recorded sound effects (.wav files) to express emotions and reactions. No TTS - Pico is non-verbal and communicates like a pet through chirps, purrs, and whistles. See
docs/Sound_Bank_Guide.mdfor complete sound creation guide. - [AI-6] IoT Smart Home Control: The robot can send commands (e.g., MQTT, HTTP requests) to smart home platforms (e.g., Home Assistant, IFTTT) to control external devices.
3. 👁️ Vision & Recognition System
- [V-1] Face Detection: The robot continuously monitors its camera feed to detect human faces in its field of view.
- [V-2] Face Recognition: The robot can be trained to recognize specific individuals and provide personalized greetings and interactions.
- [V-3] Voice Recognition: The robot can learn to identify specific voices and associate them with known individuals.
- [V-4] Personal Interaction: When a known person is detected (by face or voice), the robot provides personalized responses and remembers previous interactions.
- [V-5] Stranger Detection: When an unknown face is detected, the robot enters "curious" mode and can optionally learn new faces.
- [V-6] Privacy Mode: The robot can be configured to disable camera and face recognition features when privacy is needed.
4. 🥰 Personality & Emotion Engine
The Emotion Engine is a state machine that links triggers to reactions, creating expressive and responsive behavior.
| Feature ID | Trigger (Input) | Visual Reaction (OLED) | Audio Reaction (Speaker) |
|:-----------|:----------------|:-----------------------|:-------------------------|
| [P-1] | Power On | "Booting" animation → "Awake" | Startup chime |
| [P-2] | Idle (No activity) | Eyes "breathe" slowly or blink | Silent |
| [P-3] | Wake-Word Heard | "Listening" icon (e.g., swirl) | Affirmative "bing!" chirp |
| [P-4] | Voice Query Received | "Thinking" icon (e.g., dots) | Short "processing" sound |
| [P-5] | Command Understood | Returns to "Idle" | Plays acknowledgment sound (chirp) |
| [P-6] | IoT Command Succeeded | "Happy" eyes (e.g., ^.^) | Happy "Whoop!" sound |
| [P-7] | Command Failed (Error) | "Confused" eyes (e.g., ?_?) | Sad "womp-womp" sound |
| [P-8] | Touch Sensor (TTP223) | "Happy" eyes (e.g., ^.^) | "Purring" or "cooing" sound |
| [P-9] | Accelerometer (MPU-6050) - Picked Up | "Surprised" or "Alert" eyes (e.g., O.O) | "Hello?" or "Hmm?" sound |
| [P-10] | Accelerometer (MPU-6050) - Placed Down | "Sleepy" eyes (e.g., _ _) | "Yawn" sound |
| [P-11] | Accelerometer (MPU-6050) - Shaken | "Dizzy" or "Angry" eyes | "Wobbly" or "Stop!" sound |
| [P-12] | Low Battery | "Tired" or "Low Batt" icon | "I'm tired..." audio warning |
| [P-13] | Known Face Detected | "Happy/Recognition" eyes (e.g., ^.^) | Personalized greeting: "Hello, [Name]!" |
| [P-14] | Unknown Face Detected | "Curious" eyes (e.g., o.O) | Curious sound: "Hello there!" |
| [P-15] | No Face Visible | Returns to "Idle" state | Silent or soft ambient sounds |
5. ⚙️ Physical Hardware Features
- [H-1] Enclosure: A 3D-printable desktop design that houses all components:
- Contains the ESP32-S3-EYE (with integrated camera and microphone), battery, sensors, speaker, OLED screen, and servos.
- Designed to sit on a desk with a stable base.
- [H-2] Head Movement System: 2-axis servo mechanism allows the head to pan (left/right) and tilt (up/down) for expressive movements.
- [H-3] Desktop Design: Stationary robot designed to sit on your desk, not wearable or portable.
- [H-4] Charging System: A TP4056 module charges the LiPo battery via USB-C connection.
- [H-5] Touch Sensor: A TTP223 capacitive touch sensor integrated into the enclosure (e.g., on the "forehead") for petting interactions.
- [H-6] Camera System: A 2MP camera integrated into the ESP32-S3-EYE board provides face detection and recognition capabilities.
- [H-7] Head Movement: 2x SG90 micro servos provide 2-axis head movement (Pan left/right, Tilt up/down) for expressive body language.
6. 🖥️ Development Features
- [D-1] PC Simulation: Complete robot personality and AI can be developed and tested on a PC using Python before hardware implementation.
- [D-2] Hardware Abstraction: The software architecture allows easy porting from Python simulation to C++ hardware implementation.
- [D-3] Modular AI Training: Face recognition and voice recognition models can be trained and tested independently on PC before deployment.
- [D-4] Cross-Platform Development: Development tools work on Windows, macOS, and Linux for maximum accessibility.
7. 🔧 Technical Specifications
7.1 Performance Specifications
- [T-1] Response Time: <2 seconds for voice query processing (cloud-dependent)
- [T-2] Face Detection Speed: <500ms detection latency in good lighting conditions
- [T-3] Face Recognition Accuracy: >95% for trained individuals under normal conditions
- [T-4] Wake Word Detection: <100ms latency, <1% false positive rate
- [T-5] Battery Life: 6–8 hours continuous operation, 24+ hours standby
- [T-6] Boot Time: <5 seconds from power-on to operational state
7.2 Hardware Specifications
- [H-7] Processing Power: ESP32-S3 dual-core @ 240MHz with AI acceleration
- [H-8] Memory: 512KB SRAM + 8MB PSRAM + 16MB Flash storage
- [H-9] Camera: 2MP OV2640 with JPEG compression and face detection optimization
- [H-10] Audio: Digital I2S microphone with noise cancellation + I2S amplifier
- [H-11] Display: 0.96" OLED, 128×64 pixels for expressive animations
- [H-12] Connectivity: Wi-Fi 802.11 b/g/n + Bluetooth 5.0 LE
- [H-13] Power: 3.7V 1000mAh LiPo with USB-C fast charging
7.3 Software Architecture
- [S-5] Real-time OS: FreeRTOS for multitasking and real-time response
- [S-6] AI Framework: ESP-WHO for computer vision, ESP-SR for speech recognition
- [S-7] Communication: HTTP/HTTPS for cloud APIs, WebSocket for real-time data
- [S-8] Security: WPA2/WPA3 Wi-Fi encryption, API key protection
- [S-9] Memory Management: Dynamic allocation with garbage collection
- [S-10] Error Handling: Comprehensive exception handling with recovery mechanisms
8. 🌐 Cloud Integration Specifications
8.1 API Integration Details
- [C-1] Google Speech-to-Text: 16kHz audio with multiple language support
- [C-2] Google Gemini API: 1M token context with multimodal input support
- [C-3] Sound Bank System: Pre-recorded .wav files for pet-like communication (no TTS)
- [C-4] Fallback Systems: Offline processing when cloud services are unavailable
- [C-5] Rate Limiting: Intelligent request throttling to stay within free tiers
- [C-6] Caching: Local caching of frequent responses for faster interaction
8.2 Privacy & Security Features
- [P-16] Local Processing: Face recognition can run entirely on-device
- [P-17] Data Encryption: All cloud communications use TLS encryption
- [P-18] Privacy Mode: Disable camera and microphone recording when requested
- [P-19] Data Retention: No permanent storage of audio or video data
- [P-20] User Control: Complete control over what data is shared with cloud services
9. 🎯 Advanced AI Capabilities
9.1 Machine Learning Features
- [ML-1] Adaptive Learning: Personality adjusts based on user interaction patterns
- [ML-2] Context Memory: Remembers conversation context within sessions
- [ML-3] Emotion Recognition: Detects user emotions from voice tone and facial expressions
- [ML-4] Behavioral Prediction: Anticipates user needs based on patterns
- [ML-5] Multi-User Support: Recognizes and adapts to different family members
- [ML-6] Continuous Improvement: Model updates through over-the-air updates
9.2 Natural Language Understanding
- [NL-1] Intent Classification: Accurately categorizes user requests and commands
- [NL-2] Entity Extraction: Identifies important information from speech (names, numbers, etc.)
- [NL-3] Context Awareness: Maintains conversation context across multiple exchanges
- [NL-4] Sentiment Analysis: Understands emotional tone of user communications
- [NL-5] Multi-Language Support: Expandable to support regional languages
- [NL-6] Conversational Flow: Natural back-and-forth dialogue capabilities
10. 🔌 IoT & Smart Home Integration
10.1 Supported Protocols
- [IoT-1] MQTT: Lightweight messaging for IoT device communication
- [IoT-2] HTTP/REST: Standard web API integration
- [IoT-3] WebSocket: Real-time bidirectional communication
- [IoT-4] Bluetooth LE: Direct connection to nearby smart devices
- [IoT-5] IR Blaster: Control traditional appliances (optional hardware add-on)
10.2 Platform Compatibility
- [IoT-6] Home Assistant: Full integration with open-source home automation
- [IoT-7] Google Home: Compatible with Google Assistant ecosystem
- [IoT-8] Amazon Alexa: Integration through IFTTT or direct API
- [IoT-9] Apple HomeKit: Bridge compatibility for iOS users
- [IoT-10] Custom APIs: Support for proprietary smart home systems
11. 📱 Mobile App Integration (Future Enhancement)
11.1 Companion App Features
- [M-1] Remote Control: Control robot functions from smartphone
- [M-2] Status Monitoring: Real-time status and health monitoring
- [M-3] Configuration: Easy setup and customization interface
- [M-4] Training Interface: Simplified face and voice training process
- [M-5] Analytics: Usage statistics and interaction history
- [M-6] Updates: Over-the-air firmware and model updates
11.2 Cross-Platform Support
- [M-7] iOS App: Native iOS application with full feature support
- [M-8] Android App: Native Android application with full feature support
- [M-9] Web Interface: Browser-based control panel for any device
- [M-10] API Access: RESTful API for third-party integrations