Integrate screengrasp into your applications
Access the worlds most powerful click position prediction models through our unified API.
Choose between:
(and more)
POST https://screengrasp.onrender.com/api/getCoordinateFromDescription
Analyzes an image and returns the coordinate in a single, synchronous request. This is the recommended approach for most applications.
image OR imageBase64 - The screenshot to analyze (see details below)
Upload the image as a file using multipart/form-data:
// Using form data
const formData = new FormData();
formData.append('image', imageFile); // imageFile is your File or Blob object
formData.append('taskDescription', 'Find the login button');
Send the image as a base64-encoded string in the request body:
// Using JSON with base64
const requestBody = {
imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...', // Your base64 image
taskDescription: 'Find the login button'
};
The base64 string can be provided in two formats:
data:image/png;base64,iVBORw0KGgoA...
(includes MIME type)iVBORw0KGgoA... (without MIME type prefix,
assumed to be PNG)taskDescription - Description of what to find/click in the imagemechanism (optional) - The model to use. Options:
"screengrasp2" (default) - Reasoning Click Prediction Model using an
ensemble approach. You can also use "screengrasp2-low",
"screengrasp2-medium", or "screengrasp2-high" as a shortcut to
specify reasoning effort (see parameters below)."llabs" - CUA-NAV by LLABS model - Great for most tasks"anthropic-computer-use" - Advanced computer interaction model"openai-computer-use" - OpenAI's Computer Use model"qwen25-vl-72b" - Qwen 25 VL 72B modelparameters (optional) - Additional configuration parameters:
reasoningEffort - Only in effect for ScreenGrasp2. Controls the balance
between speed and accuracy:
"low" (default) - Fastest result, minimal token consumption
(typically 50 tokens), often faster than OpenAI or Anthropic Computer Use"medium" - Higher reliability with reasonable speed, takes more
time to confirm correct results, may consume up to 200 tokens (most often still
only 50 tokens)"high" - Highest reliability, uses advanced image preprocessing to
analyze separate image sections in greater detail, consumes 200-600 tokens,
slightly slower but more thorough. Especially recommended for high resolution
screenshots."-low", "-medium", or
"-high" suffixes with the "screengrasp2" mechanism identifier
instead of using this parameter to specify reasoning effort.
// Including parameters in your request
const requestBody = {
imageBase64: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEU...',
taskDescription: 'Find the login button',
mechanism: 'screengrasp2',
parameters: {
reasoningEffort: 'high' // For highest accuracy
}
};
x, y - Coordinates of the predicted click positionconfidence - Confidence score (0-1)analysisId - ID of the analysismechanism - The model used for the predictionexplanation (sometimes included) - The reasoning behind the predictionerror message and status codePOST https://screengrasp.onrender.com/api/createAnalysisTask
Creates a new image analysis task and returns a task ID. Use this if you prefer to handle long-running tasks asynchronously.
image OR imageBase64 - The screenshot to analyze, using the same
options as described above (file upload or base64 string)
taskDescription - Description of what to find/click in the imagemechanism (optional) - The model to use (same options as above)GET https://screengrasp.onrender.com/api/getTaskStatus/:taskId
Retrieves the current status and results of an analysis task.
status - Current task status:
QUEUED - Task is waiting in queueCOMPLETED - Analysis finished successfullyFAILED - Analysis failedpredictedClickPosition - When completed, contains one of the following:
x and y coordinates
{ status: "no_point_found", x: null, y: null } indicating the requested
UI element couldn't be located in the imagequeuePosition - When status is QUEUED, indicates position in queueerror - Error message if status is FAILEDCOMPLETED or FAILEDCOMPLETED, be sure to check the
predictedClickPosition.status property to determine if the element was found or
not.
Each API request consumes tokens based on the specific model used and complexity of the task. Your account is billed according to the token usage.