AI Prompting Guidelines

In the age of generative AI, professionals across industries are leveraging tools like ChatGPT, Midjourney, and others to boost productivity and creativity. A recent survey found 78% of organizations use AI in at least one business function, illustrating how widespread this trend has become. Below, we provide an industry-by-industry overview of how prompting can be applied, guidelines for effective prompts, and example prompts (with recommended AI tools, difficulty level, and expected outcomes).

Data Science AI Prompting Guidelines

Industry Overview

Data scientists use prompting to aid in data analysis, generate code for data manipulation, and interpret statistical results. They might ask AI to help write a Python script for cleaning data, to explain the meaning of an analysis, or to outline an approach to a machine learning problem. As part of the broader trend, many organizations now integrate AI in data workflows – for instance, McKinsey reports a significant increase in companies using AI for analytics and other functions . Prompting can speed up tasks like feature engineering brainstorming or summarizing findings for stakeholders.

Prompting Guidelines

  • Set the Data Context: Clearly describe the dataset or problem context (columns in a dataset, type of data like time-series, text, etc.) so the AI can tailor its response.
  • Specify the Task Type: Indicate if you need data cleaning code, an explanation of a concept, a statistical test recommendation, etc.
  • Encourage Step-by-Step: For complex analyses, ask the AI to break down the process into steps or outline form (this can later be translated into code or actions).
  • Safety with Code: If using AI-generated code for data, review it carefully especially for correctness on edge cases or large data performance.
  • Use Domain Language: Include relevant technical terms (e.g., "confusion matrix", "regression", "outliers") in prompts to get more precise and knowledgeable answers.

Example Prompts

BeginnerChatGPT
"Write a Python Pandas code snippet to load a CSV file named sales.csv and compute the total sales per region, outputting a new DataFrame."
Data manipulation code with Pandas
BeginnerChatGPT
"Explain in simple terms what a confusion matrix is, and how to interpret it in a classification problem (provide an example)."
Explanation of a data science concept
AdvancedChatGPT
"Suggest 3 potential features to add to a dataset of housing prices (columns: size, location, year built, price) to improve a price prediction model."
Feature engineering ideas for a ML model
BeginnerChatGPT
"Draft an SQL query to find the top 5 customers by total purchase amount from a table orders(customer_id, amount) and a table customers(id, name)."
SQL query generation for a specific task
AdvancedChatGPT
"Summarize these analysis results for a non-technical stakeholder: [insert brief analysis results or statistics]. Focus on insights and avoid jargon."
Interpretation of data findings in plain English