Automate Entity Extraction in Google Sheets with GENAI Custom Functions
Entity extraction is the process of identifying and classifying key elements (entities) in text.
If you're a business professional such as an Analyst, Marketer, Salesperson, or Product Manager, you're likely drowning in text: news articles, customer feedback, industry reports, competitor updates, and more. Buried in all that unstructured text is valuable, structured information: company names, people, locations, product mentions, and other key 'entities.' Extracting these entities manually can be time-consuming and error-prone.
Imagine needing to quickly analyze hundreds of customer interviews to identify competitor mentions or track frequently discussed features in support tickets. Manually reviewing these documents would take days of tedious work and likely cause you to miss important insights hidden within the text. This inefficient process not only wastes valuable time but also introduces inconsistencies and errors, as your attention naturally wavers during such repetitive tasks. The result is incomplete product intelligence and potentially misguided decisions based on partial data.
But what if you could automate this process? With the GENAI() custom function in Google Sheets, you can! This tutorial will demonstrate how to transform your Google Sheets spreadsheet into a powerful entity extraction tool, saving you significant time, improving data accuracy, and enabling you to focus on strategic analysis rather than manual data entry.
Don't worry if you're not a coding expert! If you can write a spreadsheet formula, you can leverage the power of AI for entity extraction.
Benefits of Automated Entity Extraction
Automating entity extraction using GENAI()
in Google Sheets offers several benefits:
Save Massive Amounts of Time: Reduce manual research and data entry by hours or even days. For example, processing 100 articles for company mentions might take a human researcher 5-10 hours. With GENAI(), this could be done in minutes.
Improve Data Accuracy and Consistency: Minimize human error and ensure a consistent approach to extracting information.
Enhance Decision Making: Get faster access to structured data from unstructured sources, enabling quicker insights for competitive analysis, lead generation, compliance monitoring, and more.
PrerequisitesGENAI() and GENAI_ARRAY() Custom Functions: This tutorial assumes you have already set up the GENAI() and GENAI_ARRAY() custom functions in your Google Sheets environment. If you haven't, please refer to our previous tutorial: "Supercharge Your Google Sheets with AI: GENAI Functions" for detailed setup instructions. These functions are your gateway to using generative AI directly within your sheets.
API Keys: The GENAI() functions are configured to use AI model providers (like Google, OpenAI, Anthropic). Ensure your API keys are correctly set up in the script's properties as described in the setup tutorial. For this example, we'll primarily focus on using Google's models.
Understanding the Example: From Text to Structured Data
GENAI() and GENAI_ARRAY() Custom Functions: This tutorial assumes you have already set up the GENAI() and GENAI_ARRAY() custom functions in your Google Sheets environment. If you haven't, please refer to our previous tutorial: "Supercharge Your Google Sheets with AI: GENAI Functions" for detailed setup instructions. These functions are your gateway to using generative AI directly within your sheets.
API Keys: The GENAI() functions are configured to use AI model providers (like Google, OpenAI, Anthropic). Ensure your API keys are correctly set up in the script's properties as described in the setup tutorial. For this example, we'll primarily focus on using Google's models.
Let's look at a typical business scenario: you have a block of text, and you want to extract specific information about companies mentioned in it.

In the example sheet shown above, we have:
Column A (Input Text): This contains the raw, unstructured text you want to analyze.
Example Text: "Apple recently announced a groundbreaking collaboration with NVIDIA to integrate advanced graphics processing units into their next generation of Mac computers. This strategic move is seen by analysts as a direct response to Microsoft's recent advancements in their Surface line."
Column B (Prompt): This is where the magic happens! You provide a detailed instruction to the AI about what entities to find and what information to return for each. A well-crafted prompt is key to getting accurate and structured results.
Example Prompt Snippet: "Your task is to analyze the text provided below and extract all mentioned company entities. For each distinct company found, determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name... CRITICAL INSTRUCTION: Your response MUST contain ONLY the raw JSON output..."
Column C (Output): This column will display the structured data extracted by the
GENAI()
function, based on your prompt and input text. In this case, it's a JSON (JavaScript Object Notation) string, which is a standard format for structured data.Example Output Snippet (JSON format):
[
{
"name": "Apple",
"status": "public",
"ticker": "AAPL",
"ceo": "Tim Cook"
},
{
"name": "NVIDIA",
"status": "public",
"ticker": "NVDA",
"ceo": "Jensen Huang"
},
{
"name": "Microsoft",
"status": "public",
"ticker": "MSFT",
"ceo": "Satya Nadella"
}
]
How it Works Together:
The GENAI() function takes the detailed Prompt from Column B, applies it to the Input Text from Column A, and sends this information to the specified AI model (in this case, a Google model). The AI then processes the request and returns the extracted entities formatted as a JSON string in Column C.
The core problem is transforming unstructured text into structured, actionable data efficiently and at scale. Manually reading the input text, identifying each company, researching its status, ticker, and CEO, and then transcribing that into a table is very time-consuming and doesn't scale if you have hundreds or thousands of such texts to process.
Step-by-Step Implementation
Let's recreate this entity extraction setup in your own Google Sheet.
Step 1: Set Up Your SheetOpen a new Google Sheet or use an existing one.
Label three columns:
Column A: Input Text
Column B: Prompt
Column C: Output
Step 2: Add Input Text
Open a new Google Sheet or use an existing one.
Label three columns:
Column A: Input Text
Column B: Prompt
Column C: Output
In cell A2, paste the following example text (or any text you want to analyze):
Apple recently announced a groundbreaking collaboration with NVIDIA to integrate advanced graphics processing units into their next generation of Mac computers. This strategic move is seen by analysts as a direct response to Microsoft's recent advancements in their Surface line.
Step 3: Craft Your Prompt for Entity Extraction
In cell B2, paste the following detailed prompt. This prompt is specifically designed to extract company information and return it in a clean JSON format.
Your task is to analyze the text provided below and extract all mentioned company entities. For each distinct company found, determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name (use null if unknown). Use information current as of April 2025.
CRITICAL INSTRUCTION: Your response MUST contain ONLY the raw JSON output. It must be a single JSON array of objects, where each object represents one company and includes the fields: name, status, ticker, and ceo.
Do NOT include any text before the opening bracket [ of the JSON array.
Do NOT include any text after the closing bracket ] of the JSON array.
Do NOT use markdown formatting.
Your entire response MUST start directly with [ and end directly with ]. DO NOT WRAP YOUR RESPONSE IN ```json ... etc.
Text to Analyze:
Understanding the Prompt Components:
Task Definition: "Your task is to analyze the text provided below and extract all mentioned company entities." - Clearly tells the AI what to do.
Information to Extract: "...determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name (use null if unknown)." - Specifies the exact data points you need.
CRITICAL INSTRUCTION (JSON Output): This entire section is important since it forces the AI to return only the JSON array. This makes the output predictable and easy to parse later. Without this, the AI might add conversational text like "Sure, here's the JSON you requested: ...", which would break automated parsing.
Step 4: Apply the GENAI() Function
In cell C2, enter the following formula:
=GENAI(B2, A2, "google")
Let's break down this formula:
GENAI: This is the name of our custom AI function.
B2: This is the cell containing your detailed prompt. The
GENAI()
function will append the content of A2 to the prompt in B2.A2: This is the cell containing the input text you want to analyze.
"google": This (optional) parameter specifies the AI model provider. If omitted, it will default to "google". You can also specify particular models like "gemini-2.5-flash-preview-04-17" if your
GENAI()
function and API access support it, by adding it as the fourth parameter:=GENAI(B2, A2, "google", "gemini-2.5-flash-preview-04-17")
.
Step 5: Review the Output
After a few seconds, cell C2 should populate with a JSON string similar to this:
[
{
"name": "Apple",
"status": "public",
"ticker": "AAPL",
"ceo": "Tim Cook"
},
{
"name": "NVIDIA",
"status": "public",
"ticker": "NVDA",
"ceo": "Jensen Huang"
},
{
"name": "Microsoft",
"status": "public",
"ticker": "MSFT",
"ceo": "Satya Nadella"
}
]
Note
The exact output might vary slightly in formatting, like spacing, but the structure should be a JSON array of objects
Congratulations! You've just extracted structured company data from raw text using AI in Google Sheets!
Customization Options
The real power comes from adapting this to your specific business needs:
Extract Different Entities:
People: Modify the prompt to ask for names of individuals, their roles, and organizations.
Products: Ask for product names, manufacturers, and features.
Locations: Extract addresses, cities, or countries.
Example Prompt Change for People: "Extract all mentioned person entities and their job titles from the text..."
Change Extracted Information: Need the founding year instead of the CEO? Or the industry of the company? Just change the fields requested in your prompt (e.g., "...and includes the fields: name, industry, founding_year."). Remember to update the JSON structure instructions in the prompt too!
Output Format (Advanced): While JSON is a best practice, you could experiment with prompts asking for CSV (Comma Separated Values) or even try to get the AI to directly output values that could fit into separate columns (though this is harder to make reliable). For multiple data points per entity, JSON is generally the best starting point.
Different AI Models/Providers: The
GENAI()
function is designed to work with multiple providers (Google, OpenAI, Anthropic) and their respective models. You can specify these in theGENAI()
formula:=GENAI(B2, A2, "openai", "gpt-4o")
=GENAI(B2, A2, "anthropic", "claude-3-haiku-20240307")
(Ensure the model names are correct and your API keys are set up for these providers).Process Multiple Texts with GENAI_ARRAY(): If you have input texts in A2, A3, A4, etc., and your prompt in B2 is designed to work with a single text input at a time, you can use
GENAI_ARRAY()
in C2 and drag it down, or use it as an array formula if your prompt is generic enough. For example, if the prompt in B2 is generic and you want to apply it to each text in A2:A10:Modify the prompt in B2 to be a template, e.g., "Extract company names from the following text: ".
Then in C2, you could use:
=GENAI_ARRAY(B$2, A2:A10, "google")
The
GENAI_ARRAY
function is designed to take a prompt template and apply it to each cell in the input range.
Troubleshooting Tips#ERROR! or Incorrect Output:
Prompt is Key: The most common issue is a prompt that isn't specific enough or doesn't clearly define the desired output format. Review the "CRITICAL INSTRUCTION" section in the example prompt. Ensure you're telling the AI exactly how to format the JSON.
Invalid JSON: If the output isn't valid JSON, your downstream parsing will fail. Try simplifying the prompt or being even more explicit about the JSON structure. You can use a JSON validator to check the AI's output.
ERROR: Prompt must be a non-empty string: This means the cell referenced for the prompt (e.g., B2) is empty or not a valid text string.
ERROR: ... API key not found...: This points to an issue with your API key setup in the Apps Script properties. Refer to the GENAI() setup tutorial.
AI Hallucinations/Incorrect Information:
AI models can sometimes "hallucinate" or provide plausible but incorrect information, especially if the requested data isn't in the input text or its training data.
Mitigation: Always critically review AI-generated data, especially for important business decisions. Always cross-verify with reliable sources.
Slow Processing: Complex prompts or long input texts can take longer. Processing many rows with individual GENAI()
calls can also be slow. GENAI_ARRAY()
is generally more efficient for batch processing as it's designed to make fewer, larger API calls.
Quota Limits: AI APIs have usage quotas. If you're processing very large datasets, you might hit these limits. Check your AI provider's dashboard for quota information. The GENAI()and GENAI_ARRAY()
custom functions use caching to help mitigate this issue by reusing responses for identical prompts and data.
Key GENAI Concepts for Entity ExtractionPrompt Engineering: This is the art and science of crafting effective prompts. For entity extraction:
Be specific about the entity types.
Clearly define the attributes for each entity.
Dictate the exact output format (JSON is highly recommended for structured data). Use strong directives like "MUST," "ONLY," "DO NOT."
JSON Output: Requesting JSON is a best practice because it's a machine-readable format that can be easily parsed into separate columns or used in other applications.
Model Selection: Different models have different strengths, knowledge cutoffs, and costs. The GENAI function allows you to experiment. For tasks like structured data extraction from text, models good at following instructions and generating formatted output are preferred. "Gemini Flash" or "Claude Haiku" models are often faster and more cost-effective for such tasks if their quality is sufficient.
Hallucinations & Verification: AI can make things up. For business-critical data, always have a human review step or cross-validation process. Don't assume AI output is 100% factual.
Token Usage & Cost: Every request to an AI model consumes "tokens" (pieces of words). Longer texts and longer responses use more tokens, which can impact cost. Optimize prompts to be concise yet clear.
Business Use Cases
#ERROR! or Incorrect Output:
Prompt is Key: The most common issue is a prompt that isn't specific enough or doesn't clearly define the desired output format. Review the "CRITICAL INSTRUCTION" section in the example prompt. Ensure you're telling the AI exactly how to format the JSON.
Invalid JSON: If the output isn't valid JSON, your downstream parsing will fail. Try simplifying the prompt or being even more explicit about the JSON structure. You can use a JSON validator to check the AI's output.
ERROR: Prompt must be a non-empty string: This means the cell referenced for the prompt (e.g., B2) is empty or not a valid text string.
ERROR: ... API key not found...: This points to an issue with your API key setup in the Apps Script properties. Refer to the GENAI() setup tutorial.
AI Hallucinations/Incorrect Information:
AI models can sometimes "hallucinate" or provide plausible but incorrect information, especially if the requested data isn't in the input text or its training data.
Mitigation: Always critically review AI-generated data, especially for important business decisions. Always cross-verify with reliable sources.
Slow Processing: Complex prompts or long input texts can take longer. Processing many rows with individual GENAI()
calls can also be slow. GENAI_ARRAY()
is generally more efficient for batch processing as it's designed to make fewer, larger API calls.
Quota Limits: AI APIs have usage quotas. If you're processing very large datasets, you might hit these limits. Check your AI provider's dashboard for quota information. The GENAI()and GENAI_ARRAY()
custom functions use caching to help mitigate this issue by reusing responses for identical prompts and data.
Prompt Engineering: This is the art and science of crafting effective prompts. For entity extraction:
Be specific about the entity types.
Clearly define the attributes for each entity.
Dictate the exact output format (JSON is highly recommended for structured data). Use strong directives like "MUST," "ONLY," "DO NOT."
JSON Output: Requesting JSON is a best practice because it's a machine-readable format that can be easily parsed into separate columns or used in other applications.
Model Selection: Different models have different strengths, knowledge cutoffs, and costs. The GENAI function allows you to experiment. For tasks like structured data extraction from text, models good at following instructions and generating formatted output are preferred. "Gemini Flash" or "Claude Haiku" models are often faster and more cost-effective for such tasks if their quality is sufficient.
Hallucinations & Verification: AI can make things up. For business-critical data, always have a human review step or cross-validation process. Don't assume AI output is 100% factual.
Token Usage & Cost: Every request to an AI model consumes "tokens" (pieces of words). Longer texts and longer responses use more tokens, which can impact cost. Optimize prompts to be concise yet clear.
Business Use Cases
Automated entity extraction can be a powerful tool for several enterprise use cases:
Competitive Intelligence: Automatically extract competitor names, product launches, and executive changes from news feeds and industry reports.
Sales & Lead Generation: Identify potential leads by extracting company names and contact details from articles, forum discussions, or website text.
Compliance & Risk Management: Scan documents for mentions of specific organizations, individuals, or regulated terms.
Knowledge Management: Automatically tag and categorize internal documents by extracting key entities, making information easier to find.
Supply Chain Management: Extract supplier names, locations, and reported issues from industry news or reports.
⚠️ Important Warnings and Security ConsiderationsAPI Key Security
The API keys required by these functions provide access to paid services. Keep your API keys strictly confidential and never share them publicly. Only generate keys for AI model providers you actively need.
Important Note on Sharing
If you share your script or spreadsheet with others, they may be able to access your embedded API keys. You will be billed for any model usage incurred through your keys, regardless of who initiated the usage.
Usage Costs
Using these functions will incur charges based on the API keys you provide:
Each API call costs money according to your provider's pricing structure.
Batch processing many cells or running frequent operations can quickly increase costs.
We recommend testing with smaller data ranges first to estimate potential costs.
Importantly, set up billing alerts or usage quotas with your API provider to avoid unexpected charges.
Data Privacy and AI LimitationsData Privacy
Any data you send to these AI providers through these functions:
Is processed on their servers according to their privacy policies
May be stored or used to improve their models (depending on their terms)
Should not include confidential, sensitive, or personally identifiable information
AI LimitationsAI models can make mistakes, hallucinate facts, or provide incorrect information
Always review AI-generated content before using it for important decisions
Use appropriate checks and balances for critical applications
Consider adding human review steps for important workflows
AI models can make mistakes, hallucinate facts, or provide incorrect information
Always review AI-generated content before using it for important decisions
Use appropriate checks and balances for critical applications
Consider adding human review steps for important workflows
Conclusion
By mastering entity extraction with GENAI()
in Google Sheets, you're not just automating a task; you're unlocking a more efficient and insightful way to work with the vast amounts of text data that businesses encounter daily. Start experimenting, refine your prompts, and see how this powerful technique can transform your workflows!
Thanks for reading.
How was this tutorial?
Your feedback helps me create better content
DISCLAIMER: This content is provided for educational purposes only. All code, templates, and information should be thoroughly reviewed and tested before use. Use at your own risk. Full Terms of Service apply.
Small Scripts, Big Impact
Join 1,500+ professionals who are supercharging their productivity with Google Sheets automation
By subscribing, you agree to our Privacy Policy and Terms of Service