AI/LLMs for Engineering Teams - Getting started
A beginner's guide to getting started with introducing AI and LLM into your development team's workflows.
AIs/LLMs are a powerful tool for engineering, but it’s often difficult for an engineering team to ‘get started’, particularly if they aren’t already familiar with it.
Around late 2025, LLM coding reached a threshold of ‘consistently good enough’. With the latest models and harnesses, it actually became a high enough quality for the day-to-day work of a brownfield system rather than a finicky curiosity or a greenfield-only accelerator.
I wrote this guide to specifically to help engineering teams ramp up adoption over time in a safe, (relatively) secure manner. This is not a guide for vibe coding - this is a guide for integrating AI into the day-to-day work of teams working on existing systems. It’s about engineering operations, process, and the work rather than the output.
Getting started
Get your team (or certain team members) set up with Claude, Cursor, or Codex on a team or enterprise account. Two key things:
Opt out of using your company data for training
Make sure it’s on a work enterprise account, not a personal account
I recommend Cursor for a team - it’s easy to set up, lets you switch models, and has an IDE as well as CLI capability. It’s harness is quite decent. With the rate models are getting better, having a bit of flexibility is quite useful.
At the same time - you really can’t go wrong with any of the above if you’re just starting out.
Concepts
AI/LLM Vendor - eg. Claude, ChatGPT - they create AI models
Agent Model - the specific model version provided by the vendor (eg. Opus 4.6)
Harness - the program/interface that the model uses to interact with the user (eg. CLI, UI, IDE)
Thinking vs. Not Thinking - Different models are intended for different use cases for different costs. Thinking means it takes longer / costs more but comes up with better answers.
If you’re in doubt, start with Opus 4.6 from Anthropic.
Reseller - vendor that provides access to models for different purposes, they may also provide a different harness
eg. Cursor (IDE + CLI), AWS Bedrock (hosting + infra)
The first use cases to address
Use AI to Explore Code
Use AI to Review Code
Use AI to Write Code
Use AI to Explore Code
Engineers often ask a lot of questions about the code-base:
How does feature flagging work?
Where is the oAuth token for the account stored ?
Do we already have a rate limiting library?
Can I get an explaination how background requests work?
To begin
Train your team to ask these questions to the LLM. With codebase access, the LLM is actually quite good and finding the answer or at least pointing the developer in the right direction. This will help reduce the amount of random interruptions and wait time for questions and help promote self-directed learning.
To advance
Connect the LLM into MCP / data sources like Confluence or Notion so it can search not just the code but context around the code - product requirements, meeting notes, definitions, specifications, etc.
Ask the LLM to then explore these:
How does feature flagging work, and why did we make this way?
Where is the oAuth token for the account stored, and what is the history of security reviews for it?
Do we already have a rate limiting library, and did we previously explore other options?
Can I get an explanation how background requests work, and have there been any incidents related to it?
This will help provide much better answers - just don’t forget to also add: Explore Confluence if needed. This enables you to go from What? to Why? questions.
Use AI to Review Code
To begin
Create and share a Skill to review code. A Skill is a repeatable Prompt that engineers can call for the LLM to follow.
Skills are super easy to make - you can actually just ask the LLM to make it for you.
Write me a skill to review code. The skill should be triggered when I enter /review-code and review the currently modified code (check via git) for these factors: correctness, security, performance issues. Output a summary and recommendations.
Review it, add your own thoughts and notes, and try it out. Tailor it to your needs - conciseness, other -ilities, etc.
It should generate a file that you can then share with your team - the vendor dashboard usually has the ability to add team-wide skill commands available to everyone.
Then, teach your team to run it on their code.
To advance
Getting everyone to do something all the time is difficult. Making it automatic is even better. Some tools like Graphite or Cursor have the ability to have skills automatically run when a Pull Request is created and to add a comment.
Even if your tool doesn’t - ask the LLM to make YOU a script that you can run to run the review Skill against every new PR in a repository:
Write me a bash script that runs /review-code against every new PR in the repository `<ORGANIZATION>/<REPOSITORY>`.
For every new PR, it should launch a new agent with <AGENT_LAUNCH_COMMAND> using the model Opus 4.6 that reviews the code.
The agent should return its output, and the script should post that output as a comment to the PR.
If the PR already has a comment from the user `jgefroh`, that means it was reviewed already and should be skipped.
Make it check for new PRs every hour from 9am to 5pm.
Technical notes:
* It can check and pull for PRs using the `gh` CLI tool.
* It can write comments using the `gh` CLI tool.
* It should create a new worktree in a peer folder when pulling the branch so that the current working tree is not affected.
* It should ONLY run against repositories in the Github organization `<ORGANIZATION>`.
* It should print out links to the PR comments at the end of every run.
* It should have a dry-run mode that outputs what it would have done without actually writing the PR comment to Github.Something like the above should produce a tweakable script that lets you then run it from your machine automatically on a schedule.
To excel
Once you have the LLM infrastructure, it’s a matter of having improving the agent’s actual prompt over time to make the reviews deeper and more valuable.
If it misses something: tweak the prompt. If it is overly-nitpicky: tweak the prompt. If you want a specific format: tweak the prompt.
Treat the review prompt almost like a Growth product.
I found it very valuable to ask the LLM to write me a quick script that pulls every single PR comment I ever wrote on the repository and extract review principles from it, and to update the review skill with those principles.
That enabled it to keep an eye on the things I like to keep an eye on:
Write me a bash script that uses the Github API (or `gh` cli) to pull every PR comment written by my user `jgefroh`.
It should put all of these in a file called comments.txt.
It must contain ALL PR comments - do not stop at just the most recent. Get ALL comments from all time.There is a file called comments.txt that contains PR comments I wrote in a repository.
Extract a set of PR Review Principles from it that I can use to enrich a PR review skill.Potential gotchas:
Review the scripts LLMs produce like you would real production-bound code - I had a bug in mine where it accidentally reviewed a random PR in a random repository because the repository name was missing!
Having everything in one skill creates shallowness. It’s good for a broad summary, but don’t expect it to catch everything. There’s a section later for using AI for things where depth matters like security audits.
Use AI to Write Code
This is obviously the most written-about topic to death nowadays - when people say AI is going to take over engineering’s job, they are typically referring to this piece (nevermind the fact it’s like 10% of the actual work).
But, it’s actually quite good nowadays at writing code. Not perfect, but much faster and definitely better than starting from 0.
To begin
Take a ticket, paste it into your LLM, and ask the LLM to create a plan to implement it. Read the plan, tweak and adjust, and once you feel confident in it, ask it to implement.
Read every single line of code it is writing. You are still responsible for the output.
Tweak the result over time. Make sure to keep an eye on common AI gotchas:
Placement and naming inconsistencies in file, folder, classes
Localized tweaks vs. using systematically available tools (eg. re-implementing feature flagging vs. using a library)
Lack of production, deployment, or migration considerations
Cross-cutting concern failures (eg. lack of authentication, authorization)
Incomplete work (it’ll tell you it’s done, but it’s not)
Be sure you’re using a good thinking model vs. one of the fast ones.
To advance
Here’s where the expertise comes in. All of the weird above errors and gotchas? AI will repeat them over and over and over.
Truth is - it’s not automatic. Issues will occur and rework will be needed. Consider this a natural part of the process. When an issue occurs, there will be two paths:
Fix the issue in the output and move on
Document the issue in an agent prompt and re-run
9/10 times, you should err on the side of fixing the prompt. Over time, this will lead to a natural decrease in the number of micro-corrections you have to make.
I do this by collecting the issues in a LLM-README.md placed in the codebase and instructing the LLM to always read it before doing any work. Your LLM-README.md should contain exactly how you want it to make decisions around issues it gets wrong:
Does it use the wrong global variable to look up a common constant? Tell the LLM to use the right, specific one in the LLM-README.
Is scaling important in your context? Tell the LLM to always consider performance under load of 5000 RPS.
Should it using specific libraries or global subsystems you have? Tell the LLM the list and when it should use them.
Over time, this will lead to a natural decrease in the number of micro-corrections you have to make. Your LLM-README.md is a guidance document that saves you the headaches of rework.
note: some tools already have a mechanism for this - eg. CLAUDE.md.
Example:
LLM-README.md
# Reusable systems
We have the following domain-independent reusable subsystems that MUST be used when implementing any of the below functionality:
* Feature Flagging - /feature_flags
* Exporting - /exports
* User-level authorization - /authorization
* Rate limiting - /security/rate_limits
Do not re-create the above. Always use the available subsystems. Do not make modification to the above subsystems
# Tech stack
Our front-end is VueJS 3 with the Option API. Do not use the Composition API.
## Javascript Rules
We use `import`, not `requires`.
# Convention rules
All page components that are routable must be named with a suffix `-page.vue`. To excel
I’m not going to lie. You’ll need other resources to excel in this regard.
While I read about people using fleets of dozens of agents across kanban boards to create simultaneous features, or developing a team of self-correcting agents, I’ve not yet found how to make any of that work for my use cases. The outputs are too unreliable, or I find them shallow, or I just don’t have the attention span to manage more than 2 streams of real development work at a time.
If it works for them - great. It just hasn’t for me so I can’t then turn around and tell you how to do that. I can only speak to what I’ve done and how it worked for me.
The one use case I did find useful was to have the AI read a pre-existing step-by-step auditing documents and iterate through it to create a set of automated end-to-end tests that emulate that process using Cypress. That worked out pretty well, but it also took the LLM a lot of trial and error and me stepping in to get it ‘unstuck’ when it got stuck in various loops. Saved me time because I could do something else while it was doing it, but it was like a 4-5 hour process for a couple of end to end tests.
Other considerations
Project AI/LLMs budget and cost for engineers
AI is typically billed as usage-based. It really depends on how much your team uses and adopts it, but at current prices, I’d estimate:
Early adoption < $50 / engineer, average - with a couple of spikes
Wide, consistent usage - ~$200 / engineer
AI-native usage - $1000+ / engineer
Prices will differ based on usage, optimization.
Don’t let the prices turn you off. You CAN get much higher ROI from initial input costs, and it’ll take a while, if at all, to get to AI-native usage if you’re reading this guide. You’ll likely end up closer to the $100 - $200 / engineer range by the time you get full, consistent usage adoption.
During adoption phase, you don’t really want anyone on the team to worry about costs unless money is super tight. What you want people to do is explore without anxiety over limits. You can teach optimization later.
A key note: AI prices are likely to increase. It’s heavily subsidized right now. I wouldn’t be surprised if it increases 10x in the future, but take advantage of low prices while you can.
Secure AI/LLMs for developer machines
AIs open up a massive pathway of potential attacks if used without guardrails. Ensure you and your team understand the potential consequences:
Destructive actions - it can randomly delete things it has access to, including production databases and files
Malicious actors can convince it to do things like send your credentials to them or even download files and run random commands (prompt injection)
The key thing to note from a developer security side is the Deadly Triad. If the LLM has all three simultaneously, it should be considered a fairly high risk environment for prompt injection:
The LLM has access to send out communications
The LLM has access to user input
The LLM has access to sensitive information
The problem, of course, is a developer environment usually has all 3 by default:
Developer machines are connected to the internet and has access to CURL, DNS, and other tools, and usually on an elevated access account. [send out communications]
Developers often like to connect to ticket systems and error reporting (eg. Jira, Sentry) which has user input. [user input]
Developers have at minimum access to the codebase and local credentials, as well as potential production access [sensitive information]
This makes securing a developer machine particularly tricky. While LLM vendors try their best to prevent these attacks, it’s still succeeding at a 1-8% rate depending on the study.
The easy go-to is to prohibit connection to systems with direct user input (much to the dismay of your team) as a first-line defense. That means no automatic ingestion of Sentry, Jira tickets, etc.
This isn’t fool-proof, but it at least decreases risk levels. If you have resources, you can also have a separate AI-specific laptop that has hard controls against infrastructure connectivity at all as a second layer.
The risk here is non-zero, but also relatively low - use a context-appropriate risk assessment.
Guiding AI agents with prompts
Writing an effective prompt is an entirely different article in its own right, and it also differs per model and model family.
If your team is new to AI, it’s helpful to go over the fundamentals. I have deeper guidance on the fundamentals of prompting in another article: AI/LLM Prompting for Beginners.
Generally though, as you write prompts for guiding the LLM for coding purposes:
Don’t describe what the code is doing. Describe the context of how you want the architecture and code to be used and created. The AI can find out what the code is doing quite well. It can’t interpret the context or intent.
Keep your prompts relatively precise. Try to avoid mixing 100 different requests into a single prompt. Asking an LLM to create a dashboard for one feature AND a new API endpoint for another feature AND a report for a third will just confuse it.
Ask your LLM to plan. Coding LLMs are quite good at planning, and spending 10 minutes in ‘plan mode’ shaping the plan by going back and forth conversationally with the AI will save you tons of headaches during implementation.
Apply global corrections globally. If an LLM is messing up consistently for you in one area, it probably is for others - that’s a good signal to raise it to the team to add to the LLM-Guidance.md for everyone.
Adoption advice
Start small. Don’t try to make your codebase LLM-friendly overnight. Start with just a single LLM-Guidance.md document and add to the rules over time.
Minimize infrastructure. If you’re just getting started, don’t try going through hoops setting up a bunch of infrastructure. You can get a lot done with just a single developer running Cursor.
eg. a developer can just run
ghGithub CLI locally and post as themselves vs. trying to get their admin to approve an organization-wide integration.
Use what you already have. You don’t have to reorganize your entire knowledge store to make it usable by LLMs. LLMs can follow links - if you have documentation like Confluence, point it at the docs. Even if you just ‘copy-paste’ it, it’s better than nothing.
Introduce the basics. You really have to help people along sometimes, and that’s OK - especially for a new technology.
Make a powerpoint with step by steps on setting Claude, ChatGPT, or Cursor up (or better yet, ask AI to make you one)
Do a team-wide demo of various use-cases like completing simple tickets or asking cursor questions about the code. Do it in real time.
Start an AI knowledge share channel for folks to ask questions in real-time (you can pivot them to ask the AI later)
Show people what it can do to create ideas.
Share skills and prompts with the team.
This is just the start. You’ll open up a world of other opportunities as you increase adoption. Think about things like:
Can you use AI to do security audits against your codebase? (yes)
Can you use AI to answer questions from your team like you would? (yes)
Can you use AI to give daily updates to stakeholders? (yes)
Can you use AI to create internal-use tools? (yes)
Can you use AI to automate other processes? (yes)
Just remember: AI is a power tool. You don’t want people running around with chainsaws! Move forward safely.
Gefroh is a product and engineering executive in Kirkland, Washington currently leading various AI adoption efforts. He’s created AI versions of himself at his current company, and has done all of the above.


