Skip to main content
POST
/
extract
Extract
curl --request POST \
  --url https://api.scaledown.xyz/extract \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "text": "<string>",
  "entities": {},
  "threshold": 0.5,
  "top_n": 0
}
'
{
  "entities": [
    {
      "text": "<string>",
      "type": "<string>",
      "confidence": 123,
      "start": 123,
      "end": 123,
      "context": "<string>"
    }
  ]
}

Overview

The /extract endpoint runs Named Entity Recognition (NER) over a block of text. Unlike standard NER, you define the entity types you want in plain English — the model uses your descriptions to find matching spans, returning each one with a confidence score and surrounding context. Every result includes up to 500 characters of surrounding text on each side, so you can validate or use the extracted value without going back to the source.

Request

text
string
required
The input text to extract entities from. Can be a full document, web page content, article, or any plain text string.
entities
object
required
A mapping of entity type names to their definition. Each value can be either:
  • A plain string — a description of what to look for
  • An object — with optional description, threshold, and top_n fields that override the global values for that entity type only
threshold
number
Global confidence threshold (0–1). Entities below this score are filtered out. Can be overridden per entity type.
top_n
number
default:0
Global limit on how many results to return per entity type, ranked by confidence descending. 0 returns all results above the threshold. Can be overridden per entity type.

Response

entities
array
List of extracted entities, sorted by confidence descending within each type.

Error responses

StatusMeaning
400 Bad RequestMalformed request body, missing required fields, or empty entities map.
401 UnauthorizedMissing or invalid x-api-key.
429 Too Many RequestsRate limit exceeded. Back off and retry.
500 Internal Server ErrorInference service unavailable.

Authentication

Include your API key in every request using the x-api-key header.
-H "x-api-key: <your-api-key>"

Examples

Basic extraction

curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i.",
    "entities": {
      "Name": "Full name of the person",
      "Twitter": "Twitter or X handle",
      "Instagram": "Instagram username"
    }
  }'
Response:
{
  "entities": [
    {
      "text": "Henry Wang",
      "type": "Name",
      "confidence": 0.994,
      "start": 0,
      "end": 10,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@henryw",
      "type": "Twitter",
      "confidence": 0.976,
      "start": 79,
      "end": 86,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    },
    {
      "text": "@b0i",
      "type": "Instagram",
      "confidence": 0.978,
      "start": 104,
      "end": 108,
      "context": "Henry Wang is a CS student from the SF Bay Area. You can find him on Twitter at @henryw and Instagram at @b0i."
    }
  ]
}

With per-entity overrides

Use per-entity threshold and top_n when different entity types need different precision, or when you only want the single best match for a given type.
curl -X POST https://api.scaledown.xyz/extract \
  -H "Content-Type: application/json" \
  -H "x-api-key: <your-api-key>" \
  -d '{
    "text": "...",
    "entities": {
      "Name": {
        "description": "Full name of a person",
        "threshold": 0.3,
        "top_n": 1
      },
      "Company": {
        "description": "Company or organization name",
        "threshold": 0.7
      },
      "Email": "Email address"
    },
    "threshold": 0.5,
    "top_n": 5
  }'
In this example:
  • Name uses threshold 0.3 and returns at most 1 result
  • Company uses threshold 0.7 and returns up to 5 results (global top_n)
  • Email uses the global threshold 0.5 and returns up to 5 results

Writing good entity labels

The entity name and description are both used as part of the model’s search criteria — wording them well is the biggest lever you have on extraction quality. Use lowercase or Title Case. The model was trained with lowercase labels. Keeping your entity names lowercase (e.g. person, company) or Title Case (e.g. Person, Company) produces better results than ALL_CAPS or other conventions. Be specific with names, and test synonyms. The entity name itself influences what the model looks for. person and full name will find slightly different things. If results are missing or noisy, try rephrasing the name — person name, individual, or full name may all behave differently on your data. Labels can be descriptive phrases, not just single words. Instead of city, use capital city and population center. The extra context helps the model distinguish between entity types that might otherwise overlap. Descriptions can be full instructions. Rather than "Name of the person", write "Find the first and last name of the person mentioned in the text". Instruction-style descriptions consistently outperform short noun phrases on complex or ambiguous entities. Avoid mixing overlapping granularities in the same call. If you include both location and city, the model has to decide which label to assign to a city — and will often split results unpredictably between them. Pick one level of granularity per concept. Examples:
Instead ofUse
CITYcity or City
city + location in the same calljust city or just location
"Name""Find the first and last name of the person in the text"
"city" (when you want capitals specifically)"capital city and population center"

Notes

  • Results within each entity type are ranked by confidence descending before top_n is applied.
  • The context field is always derived from the original text input — it is not generated by the model.
  • Character offsets (start, end) refer to byte positions in the original text string.
  • There is no fixed limit on the number of entity types you can define in a single request.

Authorizations

x-api-key
string
header
required

Body

application/json
text
string
required

The input text to extract entities from.

entities
object
required

A mapping of entity type names to their definition.

threshold
number
default:0.5

Global confidence threshold (0–1).

top_n
number
default:0

Global limit on results per entity type. 0 returns all above threshold.

Response

Successful extraction

entities
object[]