Automate Entity Extraction in Google Sheets with GENAI Custom Functions

Last updated: May 23, 2025

Entity extraction is the process of identifying and classifying key elements (entities) in text.

If you're a business professional such as an Analyst, Marketer, Salesperson, or Product Manager, you're likely drowning in text: news articles, customer feedback, industry reports, competitor updates, and more. Buried in all that unstructured text is valuable, structured information: company names, people, locations, product mentions, and other key 'entities.' Extracting these entities manually can be time-consuming and error-prone.

Imagine needing to quickly analyze hundreds of customer interviews to identify competitor mentions or track frequently discussed features in support tickets. Manually reviewing these documents would take days of tedious work and likely cause you to miss important insights hidden within the text. This inefficient process not only wastes valuable time but also introduces inconsistencies and errors, as your attention naturally wavers during such repetitive tasks. The result is incomplete product intelligence and potentially misguided decisions based on partial data.

But what if you could automate this process? With the GENAI() custom function in Google Sheets, you can! This tutorial will demonstrate how to transform your Google Sheets spreadsheet into a powerful entity extraction tool, saving you significant time, improving data accuracy, and enabling you to focus on strategic analysis rather than manual data entry.

Don't worry if you're not a coding expert! If you can write a spreadsheet formula, you can leverage the power of AI for entity extraction.

Benefits of Automated Entity Extraction

Automating entity extraction using GENAI() in Google Sheets offers several benefits:

  • Save Massive Amounts of Time: Reduce manual research and data entry by hours or even days. For example, processing 100 articles for company mentions might take a human researcher 5-10 hours. With GENAI(), this could be done in minutes.

  • Improve Data Accuracy and Consistency: Minimize human error and ensure a consistent approach to extracting information.

  • Enhance Decision Making: Get faster access to structured data from unstructured sources, enabling quicker insights for competitive analysis, lead generation, compliance monitoring, and more.

Prerequisites

  • GENAI() and GENAI_ARRAY() Custom Functions: This tutorial assumes you have already set up the GENAI() and GENAI_ARRAY() custom functions in your Google Sheets environment. If you haven't, please refer to our previous tutorial: "Supercharge Your Google Sheets with AI: GENAI Functions" for detailed setup instructions. These functions are your gateway to using generative AI directly within your sheets.

  • API Keys: The GENAI() functions are configured to use AI model providers (like Google, OpenAI, Anthropic). Ensure your API keys are correctly set up in the script's properties as described in the setup tutorial. For this example, we'll primarily focus on using Google's models.

Understanding the Example: From Text to Structured Data

Let's look at a typical business scenario: you have a block of text, and you want to extract specific information about companies mentioned in it.

A spreadsheet showing an AI model extracting company information from text and outputting it as a JSON array.

In the example sheet shown above, we have:

  • Column A (Input Text): This contains the raw, unstructured text you want to analyze.

  • Example Text: "Apple recently announced a groundbreaking collaboration with NVIDIA to integrate advanced graphics processing units into their next generation of Mac computers. This strategic move is seen by analysts as a direct response to Microsoft's recent advancements in their Surface line."

  • Column B (Prompt): This is where the magic happens! You provide a detailed instruction to the AI about what entities to find and what information to return for each. A well-crafted prompt is key to getting accurate and structured results.

  • Example Prompt Snippet: "Your task is to analyze the text provided below and extract all mentioned company entities. For each distinct company found, determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name... CRITICAL INSTRUCTION: Your response MUST contain ONLY the raw JSON output..."

  • Column C (Output): This column will display the structured data extracted by the GENAI() function, based on your prompt and input text. In this case, it's a JSON (JavaScript Object Notation) string, which is a standard format for structured data.

  • Example Output Snippet (JSON format):

[
  {
    "name": "Apple",
    "status": "public",
    "ticker": "AAPL",
    "ceo": "Tim Cook"
  },
  {
    "name": "NVIDIA",
    "status": "public",
    "ticker": "NVDA",
    "ceo": "Jensen Huang"
  },
  {
    "name": "Microsoft",
    "status": "public",
    "ticker": "MSFT",
    "ceo": "Satya Nadella"
  }
]

How it Works Together:

The GENAI() function takes the detailed Prompt from Column B, applies it to the Input Text from Column A, and sends this information to the specified AI model (in this case, a Google model). The AI then processes the request and returns the extracted entities formatted as a JSON string in Column C.

The core problem is transforming unstructured text into structured, actionable data efficiently and at scale. Manually reading the input text, identifying each company, researching its status, ticker, and CEO, and then transcribing that into a table is very time-consuming and doesn't scale if you have hundreds or thousands of such texts to process.

Step-by-Step Implementation

Let's recreate this entity extraction setup in your own Google Sheet.

Step 1: Set Up Your Sheet

  • Open a new Google Sheet or use an existing one.

  • Label three columns:

  • Column A: Input Text

  • Column B: Prompt

  • Column C: Output

Step 2: Add Input Text

In cell A2, paste the following example text (or any text you want to analyze):

Apple recently announced a groundbreaking collaboration with NVIDIA to integrate advanced graphics processing units into their next generation of Mac computers. This strategic move is seen by analysts as a direct response to Microsoft's recent advancements in their Surface line.

Step 3: Craft Your Prompt for Entity Extraction

In cell B2, paste the following detailed prompt. This prompt is specifically designed to extract company information and return it in a clean JSON format.

Your task is to analyze the text provided below and extract all mentioned company entities. For each distinct company found, determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name (use null if unknown). Use information current as of April 2025.

CRITICAL INSTRUCTION: Your response MUST contain ONLY the raw JSON output. It must be a single JSON array of objects, where each object represents one company and includes the fields: name, status, ticker, and ceo.
Do NOT include any text before the opening bracket [ of the JSON array.
Do NOT include any text after the closing bracket ] of the JSON array.
Do NOT use markdown formatting.
Your entire response MUST start directly with [ and end directly with ]. DO NOT WRAP YOUR RESPONSE IN ```json ... etc.

Text to Analyze:

Understanding the Prompt Components:

  • Task Definition: "Your task is to analyze the text provided below and extract all mentioned company entities." - Clearly tells the AI what to do.

  • Information to Extract: "...determine its official name, current public/private status, current stock ticker symbol (if public, otherwise null), and current CEO name (use null if unknown)." - Specifies the exact data points you need.

  • CRITICAL INSTRUCTION (JSON Output): This entire section is important since it forces the AI to return only the JSON array. This makes the output predictable and easy to parse later. Without this, the AI might add conversational text like "Sure, here's the JSON you requested: ...", which would break automated parsing.

Step 4: Apply the GENAI() Function

In cell C2, enter the following formula:

=GENAI(B2, A2, "google")

Let's break down this formula:

  • GENAI: This is the name of our custom AI function.

  • B2: This is the cell containing your detailed prompt. The GENAI() function will append the content of A2 to the prompt in B2.

  • A2: This is the cell containing the input text you want to analyze.

  • "google": This (optional) parameter specifies the AI model provider. If omitted, it will default to "google". You can also specify particular models like "gemini-2.5-flash-preview-04-17" if your GENAI() function and API access support it, by adding it as the fourth parameter: =GENAI(B2, A2, "google", "gemini-2.5-flash-preview-04-17").

Step 5: Review the Output

After a few seconds, cell C2 should populate with a JSON string similar to this:

[
  {
    "name": "Apple",
    "status": "public",
    "ticker": "AAPL",
    "ceo": "Tim Cook"
  },
  {
    "name": "NVIDIA",
    "status": "public",
    "ticker": "NVDA",
    "ceo": "Jensen Huang"
  },
  {
    "name": "Microsoft",
    "status": "public",
    "ticker": "MSFT",
    "ceo": "Satya Nadella"
  }
]

Congratulations! You've just extracted structured company data from raw text using AI in Google Sheets!

Customization Options

The real power comes from adapting this to your specific business needs:

  • Extract Different Entities:

  • People: Modify the prompt to ask for names of individuals, their roles, and organizations.

  • Products: Ask for product names, manufacturers, and features.

  • Locations: Extract addresses, cities, or countries.

  • Example Prompt Change for People: "Extract all mentioned person entities and their job titles from the text..."

  • Change Extracted Information: Need the founding year instead of the CEO? Or the industry of the company? Just change the fields requested in your prompt (e.g., "...and includes the fields: name, industry, founding_year."). Remember to update the JSON structure instructions in the prompt too!

  • Output Format (Advanced): While JSON is a best practice, you could experiment with prompts asking for CSV (Comma Separated Values) or even try to get the AI to directly output values that could fit into separate columns (though this is harder to make reliable). For multiple data points per entity, JSON is generally the best starting point.

  • Different AI Models/Providers: The GENAI() function is designed to work with multiple providers (Google, OpenAI, Anthropic) and their respective models. You can specify these in the GENAI() formula:

  • =GENAI(B2, A2, "openai", "gpt-4o")

  • =GENAI(B2, A2, "anthropic", "claude-3-haiku-20240307") (Ensure the model names are correct and your API keys are set up for these providers).

  • Process Multiple Texts with GENAI_ARRAY(): If you have input texts in A2, A3, A4, etc., and your prompt in B2 is designed to work with a single text input at a time, you can use GENAI_ARRAY() in C2 and drag it down, or use it as an array formula if your prompt is generic enough. For example, if the prompt in B2 is generic and you want to apply it to each text in A2:A10:

  • Modify the prompt in B2 to be a template, e.g., "Extract company names from the following text: ".

  • Then in C2, you could use: =GENAI_ARRAY(B$2, A2:A10, "google")

  • The GENAI_ARRAY function is designed to take a prompt template and apply it to each cell in the input range.

Troubleshooting Tips

  • #ERROR! or Incorrect Output:

  • Prompt is Key: The most common issue is a prompt that isn't specific enough or doesn't clearly define the desired output format. Review the "CRITICAL INSTRUCTION" section in the example prompt. Ensure you're telling the AI exactly how to format the JSON.

  • Invalid JSON: If the output isn't valid JSON, your downstream parsing will fail. Try simplifying the prompt or being even more explicit about the JSON structure. You can use a JSON validator to check the AI's output.

  • ERROR: Prompt must be a non-empty string: This means the cell referenced for the prompt (e.g., B2) is empty or not a valid text string.

  • ERROR: ... API key not found...: This points to an issue with your API key setup in the Apps Script properties. Refer to the GENAI() setup tutorial.

  • AI Hallucinations/Incorrect Information:

  • AI models can sometimes "hallucinate" or provide plausible but incorrect information, especially if the requested data isn't in the input text or its training data.

  • Mitigation: Always critically review AI-generated data, especially for important business decisions. Always cross-verify with reliable sources.

  • Slow Processing: Complex prompts or long input texts can take longer. Processing many rows with individual GENAI() calls can also be slow. GENAI_ARRAY() is generally more efficient for batch processing as it's designed to make fewer, larger API calls.

  • Quota Limits: AI APIs have usage quotas. If you're processing very large datasets, you might hit these limits. Check your AI provider's dashboard for quota information. The GENAI()and GENAI_ARRAY() custom functions use caching to help mitigate this issue by reusing responses for identical prompts and data.

Key GENAI Concepts for Entity Extraction

  • Prompt Engineering: This is the art and science of crafting effective prompts. For entity extraction:

  • Be specific about the entity types.

  • Clearly define the attributes for each entity.

  • Dictate the exact output format (JSON is highly recommended for structured data). Use strong directives like "MUST," "ONLY," "DO NOT."

  • JSON Output: Requesting JSON is a best practice because it's a machine-readable format that can be easily parsed into separate columns or used in other applications.

  • Model Selection: Different models have different strengths, knowledge cutoffs, and costs. The GENAI function allows you to experiment. For tasks like structured data extraction from text, models good at following instructions and generating formatted output are preferred. "Gemini Flash" or "Claude Haiku" models are often faster and more cost-effective for such tasks if their quality is sufficient.

  • Hallucinations & Verification: AI can make things up. For business-critical data, always have a human review step or cross-validation process. Don't assume AI output is 100% factual.

  • Token Usage & Cost: Every request to an AI model consumes "tokens" (pieces of words). Longer texts and longer responses use more tokens, which can impact cost. Optimize prompts to be concise yet clear.

Business Use Cases

Automated entity extraction can be a powerful tool for several enterprise use cases:

  • Competitive Intelligence: Automatically extract competitor names, product launches, and executive changes from news feeds and industry reports.

  • Sales & Lead Generation: Identify potential leads by extracting company names and contact details from articles, forum discussions, or website text.

  • Compliance & Risk Management: Scan documents for mentions of specific organizations, individuals, or regulated terms.

  • Knowledge Management: Automatically tag and categorize internal documents by extracting key entities, making information easier to find.

  • Supply Chain Management: Extract supplier names, locations, and reported issues from industry news or reports.

Conclusion

By mastering entity extraction with GENAI() in Google Sheets, you're not just automating a task; you're unlocking a more efficient and insightful way to work with the vast amounts of text data that businesses encounter daily. Start experimenting, refine your prompts, and see how this powerful technique can transform your workflows!

Thanks for reading.

How was this tutorial?

Your feedback helps me create better content

DISCLAIMER: This content is provided for educational purposes only. All code, templates, and information should be thoroughly reviewed and tested before use. Use at your own risk. Full Terms of Service apply.

Small Scripts, Big Impact

Join 1,500+ professionals who are supercharging their productivity with Google Sheets automation

Exclusive Google Sheets automation tutorials and hands-on exercises
Ready-to-use scripts and templates that transform hours of manual work into seconds
Email updates with new automation tips and time-saving workflows

By subscribing, you agree to our Privacy Policy and Terms of Service