Using LLMs for Descriptions of Plants, Insects, and Fungi

Background
Kindwise’s APIs initially relied on Wikipedia to provide more information on plants, insects, and fungi in our knowledge base. This information is important for Kindwise clients who want to display general descriptions in their product. Wikipedia entries, while multilingual, vary significantly in coverage across languages. For instance, English entries cover up to 82.7% of entities in our insect.id database, while less common languages account for only a few per cent.
To fill these gaps, we tested several public large language models (LLMs) for their accuracy, language correctness, and relevance in generating information. Here, we share insights from our experiments, which may help others considering LLMs for similar purposes.
Models Tested
We have tested the following LLMs:
- Mistral Pixtral 12B
- GPT-4o mini
- GPT-4o
We used temperature=0 and top p=1 to prioritize factual accuracy and to get consistent results.
Prompt Design
Our base prompt:
Please write a concise, factually accurate Wikipedia-style paragraph about [plant name] (family [family name]) in [language]. Use scientific names only, avoiding any common names or colloquial descriptions. Omit unnecessary details like awards. Do not mention flower arrangement or excessive morphological details.
We experimented with English and Czech, as a less common (and our native) language. We included taxonomic placement (e.g., family name) to reduce confusion. The prompt was modified slightly based on organism type and whether the entity was common or rare.
Findings
1. GPT-4o vs GPT-4o mini
- Not surprisingly, the GPT-4o clearly outperformed the GPT-4o mini, especially in grammar.
2. Performance on common vs. rare organisms
- Models generally excelled with common organisms but strongly hallucinated for rare species. They also included general information of little informational value, such as "this species contributes to biodiversity," reflecting the uneven availability of information across species.
- Allowing the model to decide what information to include (instead of requiring specific information like distribution or uses) reduced error rates.
3. Language-specific challenges
- Descriptions in less common languages often contained inaccuracies. For example, here’s the output for Combretum coccineum:

English (GPT-4o mini):
Combretum coccineum is a species in the family Combretaceae, native to tropical regions of Africa. It is characterized by its climbing habit and can reach significant heights, often utilizing surrounding vegetation for support. The leaves are opposite, simple, and entire, with a glossy appearance. The species is known for its distinctive red to orange inflorescences, which attract various pollinators. Combretum coccineum is often found in disturbed areas, forest edges, and along riverbanks, thriving in well-drained soils and requiring full sunlight for optimal growth.
Czech (GPT-4o mini):
Combretum coccineum je druh rostliny patřící do čeledi Combretaceae. Tento druh je rozšířen v tropických oblastech Afriky, zejména v savanách a lesních okrajích. Combretum coccineum je vytrvalá dřevina, která dosahuje výšky až 10 metrů. Listy jsou jednoduché, střídavé a mají celokrajné okraje. Plody jsou dřevnaté tobolky, které obsahují semena s křídly, což usnadňuje jejich šíření větrem.
- The Czech output wrongly states that the leaves are alternate. It describes the fruits as woody capsules (“dřevnaté tobolky”) but they are actually achenes (“nažky”).
- Smaller models (Pixtral 12B and GPT-4o mini) frequently made language errors in Czech, whereas GPT-4o made almost no mistakes.
The performance gap between English and other languages can likely be extrapolated to other less common languages. However, this gap is narrower in widely spoken languages like German, French, or Spanish and wider in languages with limited online resources.
Implementation in the API
Finally, we decided to proceed with the generation of the descriptions for 28 major languages using GPT-4o, despite higher pricing. For less common organisms, where the error rate was highest, we mitigated the improved accuracy by feeding the English wiki description into the prompt although the model tended to translate it literally (even when asked not to). Descriptions of the rarest organisms were omitted in less common languages, as these often resulted in the highest error rates. Here are the description coverage for plant.id and insect.id.
The error rate is relatively low. In a sample of 30 plants and 30 insects, we found only one factually incorrect information in both plant.id and iInsect.id, one partially wrong information in Insect.id and two grammatically inaccurate cases in both plant.id and insect.id. You can see the specific cases highlighted in this spreadsheet.
For the implementation (plant.id, insect.id), use the description_gpt
parameter. For the combined descriptions from Wikipedia and GPT-generated descriptions, use the description_all
parameter.