Integrate screengrasp into your applications
Access the worlds most powerful click position prediction models through our unified API.
Choose between:
(and more)
POST https://screengrasp.onrender.com/api/getCoordinateFromDescription
Analyzes an image and returns the coordinate in a single, synchronous request. This is the recommended approach for most applications.
image
OR imageBase64
- The screenshot to analyze (see details below)
Upload the image as a file using multipart/form-data:
// Using form data
const formData = new FormData();
formData.append('image', imageFile); // imageFile is your File or Blob object
formData.append('taskDescription', 'Find the login button');
Send the image as a base64-encoded string in the request body:
// Using JSON with base64
const requestBody = {
imageBase64: '...', // Your base64 image
taskDescription: 'Find the login button'
};
The base64 string can be provided in two formats:
...
(includes MIME type)iVBORw0KGgoA...
(without MIME type prefix, assumed to be PNG)taskDescription
- Description of what to find/click in the imagemechanism
(optional) - The model to use. Options:
"screengrasp2"
(default) - Reasoning Click Prediction Model using an ensemble approach. You can also use "screengrasp2-low"
, "screengrasp2-medium"
, or "screengrasp2-high"
as a shortcut to specify reasoning effort (see parameters below)."llabs"
- CUA-NAV by LLABS model - Great for most tasks"anthropic-computer-use"
- Advanced computer interaction model"openai-computer-use"
- OpenAI's Computer Use model"qwen25-vl-72b"
- Qwen 25 VL 72B modelparameters
(optional) - Additional configuration parameters:
reasoningEffort
- Only in effect for ScreenGrasp2. Controls the balance between speed and accuracy:
"low"
(default) - Fastest result, minimal token consumption (typically 50 tokens), often faster than OpenAI or Anthropic Computer Use"medium"
- Higher reliability with reasonable speed, takes more time to confirm correct results, may consume up to 200 tokens (most often still only 50 tokens)"high"
- Highest reliability, uses advanced image preprocessing to analyze separate image sections in greater detail, consumes 200-600 tokens, slightly slower but more thorough. Especially recommended for high resolution screenshots."-low"
, "-medium"
, or "-high"
suffixes with the "screengrasp2"
mechanism identifier instead of using this parameter to specify reasoning effort.
// Including parameters in your request
const requestBody = {
imageBase64: '...',
taskDescription: 'Find the login button',
mechanism: 'screengrasp2',
parameters: {
reasoningEffort: 'high' // For highest accuracy
}
};
x
, y
- Coordinates of the predicted click positionconfidence
- Confidence score (0-1)analysisId
- ID of the analysismechanism
- The model used for the predictionexplanation
(sometimes included) - The reasoning behind the predictionerror
message and status
codePOST https://screengrasp.onrender.com/api/createAnalysisTask
Creates a new image analysis task and returns a task ID. Use this if you prefer to handle long-running tasks asynchronously.
image
OR imageBase64
- The screenshot to analyze, using the same options as described above (file upload or base64 string)
taskDescription
- Description of what to find/click in the imagemechanism
(optional) - The model to use (same options as above)GET https://screengrasp.onrender.com/api/getTaskStatus/:taskId
Retrieves the current status and results of an analysis task.
status
- Current task status:
QUEUED
- Task is waiting in queueCOMPLETED
- Analysis finished successfullyFAILED
- Analysis failedpredictedClickPosition
- When completed, contains one of the following:
x
and y
coordinates{ status: "no_point_found", x: null, y: null }
indicating the requested UI element couldn't be located in the imagequeuePosition
- When status is QUEUED, indicates position in queueerror
- Error message if status is FAILEDCOMPLETED
or FAILED
COMPLETED
, be sure to check the predictedClickPosition.status
property to determine if the element was found or not.
Each API request consumes tokens based on the specific model used and complexity of the task. Your account is billed according to the token usage.
While ScreenGrasp specializes in predicting click positions on screenshots, Smooth Operator Agent Tools offers an expanded toolkit for Windows automation agents:
Licensing benefit: ScreenGrasp and Smooth Operator share a unified license modelโa ScreenGrasp Pro account automatically grants API access to all Smooth Operator Agent Tools.