About Pico

Features

Complete feature list and capabilities

Complete Features

This document outlines the complete feature set for the robot, organized by functional category.

1. 🤖 Core Software & System

[S-1] Boot-up Sequence: On power-on, the robot displays a boot-up animation on the OLED "eyes" and plays an optional startup chime.
[S-2] Wi-Fi Manager: The robot can be configured to connect to a local Wi-Fi network with automatic reconnection if the connection is lost.
[S-3] State Machine: A central software component (the "Emotion Engine") manages the robot's current state (e.g., IDLE, LISTENING, THINKING, RESPONDING, HAPPY).
[S-4] OTA Updates: Over-the-Air updates enable software updates for the ESP32-S3 over Wi-Fi without requiring a USB connection.

2. 🗣️ AI & Voice Interaction

[AI-1] Wake-Word Detection: The robot continuously listens on-device for a specific wake-word (e.g., "Pico"). This is the only audio processing performed while idle.
[AI-2] Speech-to-Text (STT): Upon detecting the wake-word, the robot records audio, sends it to a free-tier STT service, and converts the speech to text.
[AI-3] Intent Recognition: The software parses the text to understand user commands, differentiating between:
- Q&A / Chat: (e.g., "What's 5×4?", "Who is...")
- Internal Command: (e.g., "Sing a song", "Parrot me")
- IoT Command: (e.g., "Turn on the light")
- Mode Change: (e.g., "Connect to ChatGPT")
[AI-4] Cloud AI Integration: Ability to enter "ChatGPT Mode" where all subsequent voice queries are sent to a free-tier generative AI API (e.g., Google Gemini).
[AI-5] Sound Bank Playback: The robot plays pre-recorded sound effects (.wav files) to express emotions and reactions. No TTS - Pico is non-verbal and communicates like a pet through chirps, purrs, and whistles. See docs/Sound_Bank_Guide.md for complete sound creation guide.
[AI-6] IoT Smart Home Control: The robot can send commands (e.g., MQTT, HTTP requests) to smart home platforms (e.g., Home Assistant, IFTTT) to control external devices.

3. 👁️ Vision & Recognition System

[V-1] Face Detection: The robot continuously monitors its camera feed to detect human faces in its field of view.
[V-2] Face Recognition: The robot can be trained to recognize specific individuals and provide personalized greetings and interactions.
[V-3] Voice Recognition: The robot can learn to identify specific voices and associate them with known individuals.
[V-4] Personal Interaction: When a known person is detected (by face or voice), the robot provides personalized responses and remembers previous interactions.
[V-5] Stranger Detection: When an unknown face is detected, the robot enters "curious" mode and can optionally learn new faces.
[V-6] Privacy Mode: The robot can be configured to disable camera and face recognition features when privacy is needed.

4. 🥰 Personality & Emotion Engine

The Emotion Engine is a state machine that links triggers to reactions, creating expressive and responsive behavior.

| Feature ID | Trigger (Input) | Visual Reaction (OLED) | Audio Reaction (Speaker) | |:-----------|:----------------|:-----------------------|:-------------------------| | [P-1] | Power On | "Booting" animation → "Awake" | Startup chime | | [P-2] | Idle (No activity) | Eyes "breathe" slowly or blink | Silent | | [P-3] | Wake-Word Heard | "Listening" icon (e.g., swirl) | Affirmative "bing!" chirp | | [P-4] | Voice Query Received | "Thinking" icon (e.g., dots) | Short "processing" sound | | [P-5] | Command Understood | Returns to "Idle" | Plays acknowledgment sound (chirp) | | [P-6] | IoT Command Succeeded | "Happy" eyes (e.g., ^.^) | Happy "Whoop!" sound | | [P-7] | Command Failed (Error) | "Confused" eyes (e.g., ?_?) | Sad "womp-womp" sound | | [P-8] | Touch Sensor (TTP223) | "Happy" eyes (e.g., ^.^) | "Purring" or "cooing" sound | | [P-9] | Accelerometer (MPU-6050) - Picked Up | "Surprised" or "Alert" eyes (e.g., O.O) | "Hello?" or "Hmm?" sound | | [P-10] | Accelerometer (MPU-6050) - Placed Down | "Sleepy" eyes (e.g., _ _) | "Yawn" sound | | [P-11] | Accelerometer (MPU-6050) - Shaken | "Dizzy" or "Angry" eyes | "Wobbly" or "Stop!" sound | | [P-12] | Low Battery | "Tired" or "Low Batt" icon | "I'm tired..." audio warning | | [P-13] | Known Face Detected | "Happy/Recognition" eyes (e.g., ^.^) | Personalized greeting: "Hello, [Name]!" | | [P-14] | Unknown Face Detected | "Curious" eyes (e.g., o.O) | Curious sound: "Hello there!" | | [P-15] | No Face Visible | Returns to "Idle" state | Silent or soft ambient sounds |

5. ⚙️ Physical Hardware Features

[H-1] Enclosure: A 3D-printable desktop design that houses all components:
- Contains the ESP32-S3-EYE (with integrated camera and microphone), battery, sensors, speaker, OLED screen, and servos.
- Designed to sit on a desk with a stable base.
[H-2] Head Movement System: 2-axis servo mechanism allows the head to pan (left/right) and tilt (up/down) for expressive movements.
[H-3] Desktop Design: Stationary robot designed to sit on your desk, not wearable or portable.
[H-4] Charging System: A TP4056 module charges the LiPo battery via USB-C connection.
[H-5] Touch Sensor: A TTP223 capacitive touch sensor integrated into the enclosure (e.g., on the "forehead") for petting interactions.
[H-6] Camera System: A 2MP camera integrated into the ESP32-S3-EYE board provides face detection and recognition capabilities.
[H-7] Head Movement: 2x SG90 micro servos provide 2-axis head movement (Pan left/right, Tilt up/down) for expressive body language.

6. 🖥️ Development Features

[D-1] PC Simulation: Complete robot personality and AI can be developed and tested on a PC using Python before hardware implementation.
[D-2] Hardware Abstraction: The software architecture allows easy porting from Python simulation to C++ hardware implementation.
[D-3] Modular AI Training: Face recognition and voice recognition models can be trained and tested independently on PC before deployment.
[D-4] Cross-Platform Development: Development tools work on Windows, macOS, and Linux for maximum accessibility.

7. 🔧 Technical Specifications

7.1 Performance Specifications

[T-1] Response Time: <2 seconds for voice query processing (cloud-dependent)
[T-2] Face Detection Speed: <500ms detection latency in good lighting conditions
[T-3] Face Recognition Accuracy: >95% for trained individuals under normal conditions
[T-4] Wake Word Detection: <100ms latency, <1% false positive rate
[T-5] Battery Life: 6–8 hours continuous operation, 24+ hours standby
[T-6] Boot Time: <5 seconds from power-on to operational state

7.2 Hardware Specifications

[H-7] Processing Power: ESP32-S3 dual-core @ 240MHz with AI acceleration
[H-8] Memory: 512KB SRAM + 8MB PSRAM + 16MB Flash storage
[H-9] Camera: 2MP OV2640 with JPEG compression and face detection optimization
[H-10] Audio: Digital I2S microphone with noise cancellation + I2S amplifier
[H-11] Display: 0.96" OLED, 128×64 pixels for expressive animations
[H-12] Connectivity: Wi-Fi 802.11 b/g/n + Bluetooth 5.0 LE
[H-13] Power: 3.7V 1000mAh LiPo with USB-C fast charging

7.3 Software Architecture

[S-5] Real-time OS: FreeRTOS for multitasking and real-time response
[S-6] AI Framework: ESP-WHO for computer vision, ESP-SR for speech recognition
[S-7] Communication: HTTP/HTTPS for cloud APIs, WebSocket for real-time data
[S-8] Security: WPA2/WPA3 Wi-Fi encryption, API key protection
[S-9] Memory Management: Dynamic allocation with garbage collection
[S-10] Error Handling: Comprehensive exception handling with recovery mechanisms

8. 🌐 Cloud Integration Specifications

8.1 API Integration Details

[C-1] Google Speech-to-Text: 16kHz audio with multiple language support
[C-2] Google Gemini API: 1M token context with multimodal input support
[C-3] Sound Bank System: Pre-recorded .wav files for pet-like communication (no TTS)
[C-4] Fallback Systems: Offline processing when cloud services are unavailable
[C-5] Rate Limiting: Intelligent request throttling to stay within free tiers
[C-6] Caching: Local caching of frequent responses for faster interaction

8.2 Privacy & Security Features

[P-16] Local Processing: Face recognition can run entirely on-device
[P-17] Data Encryption: All cloud communications use TLS encryption
[P-18] Privacy Mode: Disable camera and microphone recording when requested
[P-19] Data Retention: No permanent storage of audio or video data
[P-20] User Control: Complete control over what data is shared with cloud services

9. 🎯 Advanced AI Capabilities

9.1 Machine Learning Features

[ML-1] Adaptive Learning: Personality adjusts based on user interaction patterns
[ML-2] Context Memory: Remembers conversation context within sessions
[ML-3] Emotion Recognition: Detects user emotions from voice tone and facial expressions
[ML-4] Behavioral Prediction: Anticipates user needs based on patterns
[ML-5] Multi-User Support: Recognizes and adapts to different family members
[ML-6] Continuous Improvement: Model updates through over-the-air updates

9.2 Natural Language Understanding

[NL-1] Intent Classification: Accurately categorizes user requests and commands
[NL-2] Entity Extraction: Identifies important information from speech (names, numbers, etc.)
[NL-3] Context Awareness: Maintains conversation context across multiple exchanges
[NL-4] Sentiment Analysis: Understands emotional tone of user communications
[NL-5] Multi-Language Support: Expandable to support regional languages
[NL-6] Conversational Flow: Natural back-and-forth dialogue capabilities

10. 🔌 IoT & Smart Home Integration

10.1 Supported Protocols

[IoT-1] MQTT: Lightweight messaging for IoT device communication
[IoT-2] HTTP/REST: Standard web API integration
[IoT-3] WebSocket: Real-time bidirectional communication
[IoT-4] Bluetooth LE: Direct connection to nearby smart devices
[IoT-5] IR Blaster: Control traditional appliances (optional hardware add-on)

10.2 Platform Compatibility

[IoT-6] Home Assistant: Full integration with open-source home automation
[IoT-7] Google Home: Compatible with Google Assistant ecosystem
[IoT-8] Amazon Alexa: Integration through IFTTT or direct API
[IoT-9] Apple HomeKit: Bridge compatibility for iOS users
[IoT-10] Custom APIs: Support for proprietary smart home systems

11. 📱 Mobile App Integration (Future Enhancement)

11.1 Companion App Features

[M-1] Remote Control: Control robot functions from smartphone
[M-2] Status Monitoring: Real-time status and health monitoring
[M-3] Configuration: Easy setup and customization interface
[M-4] Training Interface: Simplified face and voice training process
[M-5] Analytics: Usage statistics and interaction history
[M-6] Updates: Over-the-air firmware and model updates

11.2 Cross-Platform Support

[M-7] iOS App: Native iOS application with full feature support
[M-8] Android App: Native Android application with full feature support
[M-9] Web Interface: Browser-based control panel for any device
[M-10] API Access: RESTful API for third-party integrations