API Documentation

Integrate screengrasp into your applications

Access the worlds most powerful click position prediction models through our unified API.

Choose between:

(and more)

Note: Upon signing up (via Sign In With Google), you receive 1,000 free API credits to get started. Once these credits are used, a Pro or Enterprise subscription plan is required to purchase additional credits and continue using the API.

Your Plan

Loading...
Change Plan

API Credits

Loading...

Your API Key

This API key is used for both the screengrasp API and the Smooth Operator Agent Tools.

API Endpoints

Get Coordinate From Description (Recommended)

POST https://screengrasp.onrender.com/api/getCoordinateFromDescription

Analyzes an image and returns the coordinate in a single, synchronous request. This is the recommended approach for most applications.

Important: This endpoint waits for the analysis to complete before responding, which may take up to a few seconds, depending on the complexity of the task and the model used.

Parameters

  • image OR imageBase64 - The screenshot to analyze (see details below)
    Option 1: File Upload

    Upload the image as a file using multipart/form-data:

    // Using form data
    const formData = new FormData();
    formData.append('image', imageFile);  // imageFile is your File or Blob object
    formData.append('taskDescription', 'Find the login button');
    Option 2: Base64 String

    Send the image as a base64-encoded string in the request body:

    // Using JSON with base64
    const requestBody = {
      imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...', // Your base64 image
      taskDescription: 'Find the login button'
    };

    The base64 string can be provided in two formats:

    • Data URL format: data:image/png;base64,iVBORw0KGgoA... (includes MIME type)
    • Raw base64: iVBORw0KGgoA... (without MIME type prefix, assumed to be PNG)
  • taskDescription - Description of what to find/click in the image
  • mechanism (optional) - The model to use. Options:
    • "screengrasp2" (default) - Reasoning Click Prediction Model using an ensemble approach. You can also use "screengrasp2-low", "screengrasp2-medium", or "screengrasp2-high" as a shortcut to specify reasoning effort (see parameters below).
    • "llabs" - CUA-NAV by LLABS model - Great for most tasks
    • "anthropic-computer-use" - Advanced computer interaction model
    • "openai-computer-use" - OpenAI's Computer Use model
    • "qwen25-vl-72b" - Qwen 25 VL 72B model
  • parameters (optional) - Additional configuration parameters:
    • reasoningEffort - Only in effect for ScreenGrasp2. Controls the balance between speed and accuracy:
      • "low" (default) - Fastest result, minimal token consumption (typically 50 tokens), often faster than OpenAI or Anthropic Computer Use
      • "medium" - Higher reliability with reasonable speed, takes more time to confirm correct results, may consume up to 200 tokens (most often still only 50 tokens)
      • "high" - Highest reliability, uses advanced image preprocessing to analyze separate image sections in greater detail, consumes 200-600 tokens, slightly slower but more thorough. Especially recommended for high resolution screenshots.
      You can also use the "-low", "-medium", or "-high" suffixes with the "screengrasp2" mechanism identifier instead of using this parameter to specify reasoning effort.
    Example
    // Including parameters in your request
    const requestBody = {
      imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...', 
      taskDescription: 'Find the login button',
      mechanism: 'screengrasp2',
      parameters: {
        reasoningEffort: 'high'  // For highest accuracy
      }
    };

Response

  • On success:
    • x, y - Coordinates of the predicted click position
    • confidence - Confidence score (0-1)
    • analysisId - ID of the analysis
    • mechanism - The model used for the prediction
    • explanation (sometimes included) - The reasoning behind the prediction
  • On error: error message and status code

Create Analysis Task

POST https://screengrasp.onrender.com/api/createAnalysisTask

Creates a new image analysis task and returns a task ID. Use this if you prefer to handle long-running tasks asynchronously.

Important: Our servers automatically enter sleep mode during periods of inactivity. The first API call after such a period may take several minutes while the server boots up. Subsequent calls will be significantly faster as long as the server remains active.

Parameters

  • image OR imageBase64 - The screenshot to analyze, using the same options as described above (file upload or base64 string)
  • taskDescription - Description of what to find/click in the image
  • mechanism (optional) - The model to use (same options as above)

Get Task Status

GET https://screengrasp.onrender.com/api/getTaskStatus/:taskId

Retrieves the current status and results of an analysis task.

Response

  • status - Current task status:
    • QUEUED - Task is waiting in queue
    • COMPLETED - Analysis finished successfully
    • FAILED - Analysis failed
  • predictedClickPosition - When completed, contains one of the following:
    • If element is found: An object with x and y coordinates
    • If element is not found: An object with { status: "no_point_found", x: null, y: null } indicating the requested UI element couldn't be located in the image
  • queuePosition - When status is QUEUED, indicates position in queue
  • error - Error message if status is FAILED
Implementation Note: This endpoint should be polled regularly (every 500ms) until either:
  • The status is COMPLETED or FAILED
  • A timeout is reached (recommended: 5 minutes)
When the status is COMPLETED, be sure to check the predictedClickPosition.status property to determine if the element was found or not.
                    
                

Token Usage and Billing

Each API request consumes tokens based on the specific model used and complexity of the task. Your account is billed according to the token usage.

Model Token Consumption

Looking for advanced Windows automation?

While ScreenGrasp specializes in predicting click positions on screenshots, Smooth Operator Agent Tools offers an expanded toolkit for Windows automation agents:

Licensing benefit: ScreenGrasp and Smooth Operator share a unified license modelโ€”a ScreenGrasp Pro account automatically grants API access to all Smooth Operator Agent Tools.

Explore Smooth Operator Agent Tools →