Comprehensive Testing: Text Output And Column Selection

by Alex Johnson 56 views

This article details the comprehensive testing strategy for the text output format and column selection features implemented across several issues. The goal is to ensure robust functionality and prevent regressions by thoroughly validating the changes.

References

  • Parent Issue: #584 (529.2-cli-output-implementation)
  • Testing Standards: Milestone #14 (ls-test-improvement)
  • Testing Principles: #556 (Toyota Andon Cord, CRUD lifecycle pattern)
  • Research: docs/research/581-cli-output-research.md

Testing Philosophy

Our testing philosophy is guided by the principles outlined in #556. These principles emphasize the importance of preventing faulty code from being merged, ensuring the CRUD lifecycle is thoroughly verified, and avoiding superficial tests that don't validate actual behavior. We aim to catch issues early and prevent regressions by adhering to these guidelines.

  1. Toyota Andon Cord: Any failing test immediately halts the merge process. This prevents potentially unstable code from entering the main codebase.
  2. CRUD Lifecycle Pattern: We verify the actual behavior and output of the system, rather than simply checking exit codes. This ensures that the system functions as expected throughout its lifecycle.
  3. Avoid Anemic Tests: We move beyond simply checking $? == 0. We verify that the output content matches our expectations. This helps prevent situations where tests pass despite underlying issues.
  4. Root Cause Prevention: We learn from past mistakes, such as #536 where tests passed but prompt list returned empty results. We strive to create tests that are comprehensive and can detect a wide range of potential issues.

Testing Strategy

Our testing strategy is divided into phases, each focusing on a specific aspect of the text output and column selection features. Each phase includes unit and integration tests designed to validate the functionality and ensure it meets our quality standards.

Unit Tests (Phase 1 - #585)

These tests focus on the core logic within the cli/src/output.rs file. The goal is to validate the parsing and rendering of output formats, as well as the handling of column selections.

  • [ ] Test OutputFormat::from_str("text") parsing: Ensure that the text format string is correctly parsed into the corresponding enum variant. This is crucial for handling user input and configuration settings.
  • [ ] Test OutputFormat::from_str("records") parsing: Verify that the records format string is also correctly parsed. This ensures that all supported output formats are properly recognized.
  • [ ] Test invalid format strings return errors: Confirm that providing an invalid format string results in an appropriate error being returned. This prevents unexpected behavior and provides helpful feedback to the user.
  • [ ] Test TSV rendering with tab separators: Validate that the TSV output format uses tab characters as separators between columns. This ensures that the output is correctly formatted for downstream processing.
  • [ ] Test TSV rendering escapes special characters correctly: Ensure that special characters within the data are properly escaped when rendering TSV output. This prevents data corruption and ensures that the output is correctly interpreted.
  • [ ] Test column metadata trait implementation: Verify that the column metadata trait is correctly implemented for all relevant data structures. This allows for consistent handling of column information throughout the system.
  • [ ] Test --columns parsing (comma-separated list): Validate that the --columns option is correctly parsed, allowing users to specify a comma-separated list of columns to include in the output. This provides flexibility in selecting the desired data fields.
  • [ ] Test --columns validation (reject unknown column names): Ensure that the --columns option validates the provided column names, rejecting any unknown or invalid names. This prevents errors and ensures that users only request valid data fields.

Integration Tests (Phase 2 - #587)

These tests focus on the langstar prompt list command and verify the CRUD lifecycle. We set up test data, run the command with various flags, check the output content, and clean up the test data. Integration tests are crucial for verifying the interaction between different components of the system.

  • [ ] prompt list -o text outputs TSV format with default columns: Verify that the command outputs data in TSV format when the -o text option is specified, using the default set of columns. This ensures that the basic text output functionality is working correctly.
  • [ ] prompt list -o text --columns handle outputs only handle column: Confirm that the command outputs only the handle column when the --columns handle option is specified along with -o text. This validates the column selection feature.
  • [ ] prompt list -o text --columns handle,downloads outputs two columns tab-separated: Ensure that the command outputs both the handle and downloads columns, separated by a tab character, when the --columns handle,downloads option is specified along with -o text. This verifies the ability to select multiple columns.
  • [ ] prompt list --show-columns lists: handle, likes, downloads, public, description, created_at: Validate that the --show-columns option lists the available columns for the prompt list command. This helps users discover the available data fields.
  • [ ] Verify tab separators (not spaces) between columns: Confirm that tab characters are consistently used as separators between columns in the TSV output format. This ensures the output is machine-readable and parsable.
  • [ ] Verify newlines between rows: Ensure that each row of data is separated by a newline character. This provides clear separation between records in the output.
  • [ ] Verify no header row by default: Validate that the TSV output format does not include a header row by default. This simplifies parsing and processing of the output.
  • [ ] Handle empty result sets gracefully: Ensure that the command handles empty result sets gracefully, without producing errors or unexpected output. This is important for scenarios where no data matches the query.
  • [ ] Handle very long field values (description truncation/escaping): Verify that very long field values, such as descriptions, are handled correctly, either by truncating them or escaping special characters. This prevents display issues and data corruption.
  • [ ] Test with LANGSTAR_OUTPUT_FORMAT=text environment variable: Confirm that the LANGSTAR_OUTPUT_FORMAT environment variable can be used to set the default output format to text. This provides a convenient way to configure the output format globally.

Output verification pattern:

let output = cmd.output()?;
assert!(output.status.success());
let stdout = String::from_utf8(output.stdout)?;

// ✅ GOOD: Verify actual content
assert!(stdout.contains("expected-handle\t123"));
assert_eq!(stdout.lines().count(), expected_row_count);

// ❌ BAD: Only checking exit code (anemic test)
// assert!(output.status.success()); // Not enough!

Integration Tests (Phase 3 - #589)

This phase extends the integration tests to eight additional commands. We apply the same testing pattern used in Phase 2 to ensure consistency and thorough coverage across the entire application.

  • [ ] assistant list -o text --columns <fields>
  • [ ] graph list -o text --columns <fields>
  • [ ] runs query -o text --columns <fields>
  • [ ] queue list -o text --columns <fields>
  • [ ] dataset list -o text --columns <fields>
  • [ ] eval list -o text --columns <fields>
  • [ ] secrets list -o text --columns <fields>
  • [ ] model-config list -o text --columns <fields>

For each command:

  1. Test -o text with default columns
  2. Test --columns with single field
  3. Test --columns with multiple fields
  4. Test --show-columns discovery
  5. Verify actual output content matches expected data

Integration Tests (Phase 4 - #591)

This phase focuses on the records format (psql \x style) and config file support. We ensure that the records format displays data correctly and that config file settings are properly applied.

  • [ ] prompt list -o records displays vertical format: Verify that the command displays data in a vertical format when the -o records option is specified. This ensures that the basic records output functionality is working correctly.
  • [ ] Records format shows one record per section: Confirm that the records format displays one record per section, with each field on a separate line. This improves readability and allows for easier data interpretation.
  • [ ] Field alignment is correct: Ensure that the fields in the records format are correctly aligned, making the output visually appealing and easy to scan.
  • [ ] Multi-record output includes separators (-[ RECORD N ]): Validate that multi-record output includes separators between records, such as -[ RECORD N ]. This helps distinguish between different records in the output.
  • [ ] Empty result sets handled gracefully: Ensure that the command handles empty result sets gracefully, without producing errors or unexpected output. This is important for scenarios where no data matches the query.
  • [ ] Long field values display without truncation: Verify that long field values are displayed without truncation in the records format. This ensures that all data is fully visible and accessible.

Config file support:

  • [ ] ~/.config/langstar/config.toml sets default format: Verify that the config file can be used to set the default output format. This provides a convenient way to configure the output format globally.
  • [ ] [output.columns.prompt] sets default columns for prompt list: Confirm that the config file can be used to set the default columns for the prompt list command. This allows users to customize the output for specific commands.
  • [ ] CLI flags override config file defaults: Ensure that CLI flags override config file defaults, allowing users to override the configured settings when needed. This provides flexibility and control over the output format.
  • [ ] Invalid config values produce helpful error messages: Validate that invalid config values produce helpful error messages, guiding users to correct their configuration settings. This improves the user experience and prevents configuration errors.

Test Data Requirements

We have three options for test data management, each with its own advantages and disadvantages. The chosen approach should balance test execution speed, reliability, and maintainability.

Option 1: Use Existing Test Deployment

  • Use deployment from tests/fixtures/test-graph-deployment/
  • Verify test data exists before running tests
  • Document required test data shape

Option 2: Mock API Responses

  • Use HTTP mocking for SDK responses
  • Faster test execution
  • No external dependencies

Option 3: Hybrid Approach

  • Unit tests use mocks
  • Integration tests use real test deployment
  • Document when each approach applies

Success Criteria

To be successful, the testing process must meet the following criteria:

  • [ ] All unit tests pass for OutputFormat parsing and rendering
  • [ ] Integration tests verify actual output content (not just exit codes)
  • [ ] All 9 list commands have integration test coverage
  • [ ] Records format has integration test coverage
  • [ ] Config file functionality has integration test coverage
  • [ ] CI pipeline fails on any test failure (Toyota Andon Cord)
  • [ ] Test documentation added to docs/dev/testing/
  • [ ] Tests prevent regression of #536-style issues (empty output passing tests)

Dependencies

This testing effort is dependent on the completion of the following issues:

  • Depends on: #585 (Phase 1 - Core Infrastructure)
  • Depends on: #587 (Phase 2 - Pilot Command)
  • Depends on: #589 (Phase 3 - Rollout)
  • Depends on: #591 (Phase 4 - Polish)

Effort Estimate

  • Unit tests: ~2 hours
  • Integration tests (pilot): ~3 hours
  • Integration tests (rollout): ~4 hours
  • Integration tests (polish): ~2 hours
  • Documentation: ~1 hour
  • Total: ~12 hours

Notes

This issue consolidates testing tasks previously scattered across #585, #587, #589, and #591. By creating a dedicated testing phase, we ensure comprehensive test coverage following the principles from milestone #14 (ls-test-improvement) and prevent anemic tests that only verify exit codes.

For more information on software testing best practices, check out this article on Software Testing Fundamentals.