API Documentation

Integrate screengrasp into your applications

Access the worlds most powerful click position prediction models through our unified API.

Choose between:

ScreenGrasp 2 - Recommended. Reasoning click prediction ensemble model that dynamically adapts its processing time based on task complexity. For typical tasks its faster than Anthropic or OpenAI Computer Use, for more challenging scenarios, it engages in deeper reasoning using multiple leading AI models with intelligent verification. ScreenGrasp 2 considers the predictions of different models if required, starting with CUA-NAV by LLABS (exclusively available through screengrasp.com) as the preferred model.
CUA-NAV by LLABS - Strongest non-ensemble model, exclusively available through screengrasp.com
Anthropic Computer Use - Advanced computer interaction model from Anthropic (a bit slow)
OpenAI Computer Use - OpenAI's model for computer interaction tasks (a bit slow)

(and more)

Note: Upon signing up (via Sign In With Google), you receive 1,000 free API credits to get started. Once these credits are used, a Pro or Enterprise subscription plan is required to purchase additional credits and continue using the API.

Your Plan

Change Plan

API Credits

Your API Key

This API key is used for both the screengrasp API and the Smooth Operator Agent Tools.

API Endpoints

Get Coordinate From Description (Recommended)

POST https://screengrasp.onrender.com/api/getCoordinateFromDescription

Analyzes an image and returns the coordinate in a single, synchronous request. This is the recommended approach for most applications.

Important: This endpoint waits for the analysis to complete before responding, which may take up to a few seconds, depending on the complexity of the task and the model used.

Parameters

image OR imageBase64 - The screenshot to analyze (see details below)
Option 1: File Upload

Upload the image as a file using multipart/form-data:
```
// Using form data
const formData = new FormData();
formData.append('image', imageFile);  // imageFile is your File or Blob object
formData.append('taskDescription', 'Find the login button');
```
Option 2: Base64 String

Send the image as a base64-encoded string in the request body:
```
// Using JSON with base64
const requestBody = {
  imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...', // Your base64 image
  taskDescription: 'Find the login button'
};
```
The base64 string can be provided in two formats:
- Data URL format: data:image/png;base64,iVBORw0KGgoA... (includes MIME type)
- Raw base64: iVBORw0KGgoA... (without MIME type prefix, assumed to be PNG)
taskDescription - Description of what to find/click in the image
mechanism (optional) - The model to use. Options:
- "screengrasp2" (default) - Reasoning Click Prediction Model using an ensemble approach. You can also use "screengrasp2-low", "screengrasp2-medium", or "screengrasp2-high" as a shortcut to specify reasoning effort (see parameters below).
- "llabs" - CUA-NAV by LLABS model - Great for most tasks
- "anthropic-computer-use" - Advanced computer interaction model
- "openai-computer-use" - OpenAI's Computer Use model
- "qwen25-vl-72b" - Qwen 25 VL 72B model
parameters (optional) - Additional configuration parameters:
- reasoningEffort - Only in effect for ScreenGrasp2. Controls the balance between speed and accuracy:
  - "low" (default) - Fastest result, minimal token consumption (typically 50 tokens), often faster than OpenAI or Anthropic Computer Use
  - "medium" - Higher reliability with reasonable speed, takes more time to confirm correct results, may consume up to 200 tokens (most often still only 50 tokens)
  - "high" - Highest reliability, uses advanced image preprocessing to analyze separate image sections in greater detail, consumes 200-600 tokens, slightly slower but more thorough. Especially recommended for high resolution screenshots.
  You can also use the "-low", "-medium", or "-high" suffixes with the "screengrasp2" mechanism identifier instead of using this parameter to specify reasoning effort.
Example
```
// Including parameters in your request
const requestBody = {
  imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...', 
  taskDescription: 'Find the login button',
  mechanism: 'screengrasp2',
  parameters: {
    reasoningEffort: 'high'  // For highest accuracy
  }
};
```

Response

On success:
- x, y - Coordinates of the predicted click position
- confidence - Confidence score (0-1)
- analysisId - ID of the analysis
- mechanism - The model used for the prediction
- explanation (sometimes included) - The reasoning behind the prediction
On error: error message and status code

Create Analysis Task

POST https://screengrasp.onrender.com/api/createAnalysisTask

Creates a new image analysis task and returns a task ID. Use this if you prefer to handle long-running tasks asynchronously.

Important: Our servers automatically enter sleep mode during periods of inactivity. The first API call after such a period may take several minutes while the server boots up. Subsequent calls will be significantly faster as long as the server remains active.

Parameters

image OR imageBase64 - The screenshot to analyze, using the same options as described above (file upload or base64 string)
taskDescription - Description of what to find/click in the image
mechanism (optional) - The model to use (same options as above)

Get Task Status

GET https://screengrasp.onrender.com/api/getTaskStatus/:taskId

Retrieves the current status and results of an analysis task.

Response

status - Current task status:
- QUEUED - Task is waiting in queue
- COMPLETED - Analysis finished successfully
- FAILED - Analysis failed
predictedClickPosition - When completed, contains one of the following:
- If element is found: An object with x and y coordinates
- If element is not found: An object with { status: "no_point_found", x: null, y: null } indicating the requested UI element couldn't be located in the image
queuePosition - When status is QUEUED, indicates position in queue
error - Error message if status is FAILED

Implementation Note: This endpoint should be polled regularly (every 500ms) until either:

The status is COMPLETED or FAILED
A timeout is reached (recommended: 5 minutes)

When the status is COMPLETED, be sure to check the predictedClickPosition.status property to determine if the element was found or not.

Token Usage and Billing

Each API request consumes tokens based on the specific model used and complexity of the task. Your account is billed according to the token usage.

Model Token Consumption

screengrasp2 - low effort (default): 50 tokens per request
screengrasp2 - medium effort: 50-200 tokens per reques (easy tasks often only consume 50 tokens)
screengrasp2 - high effort: 200-600 tokens per request, depending on complexity of the task
all other models: 50 tokens per request

API Documentation

Your Plan

API Credits

Your API Key

API Endpoints

Get Coordinate From Description (Recommended)

Parameters

Option 1: File Upload

Option 2: Base64 String

Example

Response

Create Analysis Task

Parameters

Get Task Status

Response

Token Usage and Billing

Model Token Consumption

Looking for advanced Windows automation?

Looking for advanced Windows automation?