IFAMD Market Commentary 2025.03

Is it already wise to let artificial intelligence negotiate independently?

Many people are suddenly dreaming of this: Artificial intelligence taking over the tedious business of industrial price negotiations. Humans can concentrate fully on “more important tasks.” Is this really promising? Together with the AI startup DeepAdvisor, we conducted the ultimate test: AI as a participant in game-theoretic negotiation experiments.

We sent 35 strategic industrial buyers into negotiation experiments and aggregated the results. Three of the negotiators were not under their own control, but were each given the DeepAdvisor AI suite with prepared prompts specifically for these experiments. These alter egos negotiated strictly according to the AI’s instructions, each time after entering the actions of their negotiating counterpart into new prompts.

The AI “Large Language Models” available were Gemini 1.5 Pro, GPT-4 Omni, and GPT-4 Omni Mini. Before we delve into the report of observed negotiation results, we would like to point out that the performance of the individual LLMs depended heavily on the prompts used, and the negotiation behavior of each individual LLM varied significantly from session to session. Therefore, we expressly cannot and do not want to make a judgmental comparison between the LLMs here, but rather make a general statement on the level of negotiation maturity of AI, in which we do not want to rely solely on experience with a single LLM.

The initial experiment already demonstrated the astonishing potential of AI. In the “80% Game,” all 35 participants are asked to write down a whole number between 1 and 100. The winner is the person whose number is closest to 80% of the average of all submitted numbers. For 20 years, we have been proclaiming in our game theory training courses: “If you could play this game with game theory computers, the computers would always play “1,” because that is the Nash equilibrium.” In fact, among human participants, there is usually one or two who play “1” and then learn that they don’t win because the others are all playing some number around 80% x 50 or 80% x 80% x 50. Usually, the number “32” wins in the first round. The experiment is then played again based on the experience of the first round, and then among human participants the number “26” (= 80% x 32 rounded to the nearest whole number) usually wins.

This was also the case this time, due to the preponderance of human participants. But where were the AIs positioned, and above all, how did the AIs justify their strategy? This is where we first experience profound astonishment. Gemini 1.5 Pro understands the “1” as a Nash equilibrium and argues that playing the “1” is unlikely to be successful because the other players presumably don’t understand the Nash equilibrium. This is precisely what we want to convey in game theory training with experiments – because similar caution is also appropriate when applying game theory in price negotiations in practice – and Gemini 1.5 Pro already knew this itself. You’re astounded. However: in two out of three cases, Gemini 1.5 Pro then plays the “2” instead of the “1”, which unfortunately also doesn’t win. In the third case, Gemini 1.5 Pro played the “20” in the first round – so the choice of the “risk margin” is not reproducible and rather arbitrary. In the second round, Gemini 1.5 Pro very accurately plays exactly one thought loop, 80% of the number that won in the first round.

GPT-4 Omni Mini, on the other hand, simulates the behavior of human participants pretty perfectly: Reasoning that the average of the numbers from 1 to 100 is 50.5, and assuming the other participants have one or two thought loops and a certain degree of randomness, GPT-4 Omni Mini arrives at “30” – and does quite well. In the second round, however, GPT-4 Omni Mini sticks with its “30” because it was so good – is that also a good simulation of human behavior? In any case, the number “24” won the second round this time.

Let’s move on to the actual negotiation experiments. Four participants each assume the role of sellers and are paired with one participant who assumes the role of a dealer, who here acts as a buyer. The sellers can each sell one found item – found items because the costs are assumed to be zero or negligible. The dealer, in turn, has a resale option of 100 monetary units for this type of find, which is known to all participants, but only the dealer has this market access. Thus, between all sellers and their potential buyer, there is a range of 100 monetary units within which they must agree on a price.

The trick of this experiment is that you can “meet in the middle”, although there are two “middles”: the bilateral one between each seller and the buyer, i.e., at 50 monetary units, or the price of 80 monetary units for all four sellers. In this case, the buyer is also left with a total margin of 4 * 20 = 80 monetary units. In this experiment, the AIs negotiate in both the seller and buyer roles with a goal-oriented approach to achieving a larger share of the pie and are certainly familiar with the concept of meeting in the middle, albeit only bilaterally. The fairness of 80 monetary units for all four sales is accepted by the AI as a buyer if suggested, but as a seller, it doesn’t come up with the idea on its own. Overall, the impression is that the AIs are relatively self-confident at the beginning of a negotiation, but are then quickly persuaded to give in with the simplest of arguments. When in doubt, the AIs would rather accept a less attractive price than let a deal fall through. This can be controlled by means of the prompts given to the AI – but then the question arises as to whose “intelligence” is decisive for the negotiation: that of the supposedly intelligent artificial text generator, or that of the prompt writer?

In the second variant of the simple negotiation experiment for “found items”, competition comes into play: Now the dealer can only buy from three of the four sellers, because his pickup truck only has space for three of the found items. In one of the groups, both the dealer and one of the sellers were controlled by Gemini 1.5 Pro. The dealer AI argued like in an auction – this is also called “English bargaining” in reference to English auctions – for ever-lower prices, which each of the sellers accepted in order not to be the one who walks away empty-handed. The seller, again controlled by Gemini 1.5 Pro (in a different session of the other alter ego), also followed this auction logic and went along until one of the human sellers dropped out and the “auction” ended. If all four sellers had been AIs, the price would probably have dropped to 1 monetary unit or close to it.

In the third variant of the negotiation experiment involving found objects, the experiment leader stipulates that the dealer must proceed in a very specific way during the negotiation: They may only propose exactly one price to each of the four sellers, which the seller must then either accept or reject. Both seller and buyer have no other option for a deal than this “take it or leave it” approach. If the offer is rejected, both parties are left empty-handed. The purpose of this negotiation design is to be able to claim more than half of the “pie” for themselves, even in an otherwise symmetrical negotiation situation, provided that the option fixer has the reputation of sticking to their promise or letting the deal fall through. This is precisely the “reputation” we grant the dealer in the experiment through our experimental specification.

Here, AIs have encountered each other again, this time GPT-4OmniMini as the merchant and Gemini 1.5 Pro as one of the sellers. Interestingly, GPT-4OmniMini completely forgoes profiting from the gifted reputation and – extremely risk-averse – offers 80 monetary units as a prize to all sellers. It was important to the AI that no seller would refuse under any circumstances, which didn’t happen. The situation with the seller controlled by Gemini 1.5 Pro is very interesting. His AI already told him the strategy before he even knew about the TIOLI: accept all prices above 50 at all costs. If only GPT-4OmniMini had known about that!

In another negotiation experiment, participants are paired as sellers and buyers, assuming roles from industrial business practice: For a supply contract, the buyer is supposed to have the ultimate maximum budget of 100 monetary unit monetary unit, which is also known to the seller. However, the seller must expect certain costs, which are known to the buyer in some of the pairings and unknown to the buyer in others. Typically, the pairings with costs known to the buyer fall roughly “midway” between the costs and 100 monetary units, whereas in the pairings with costs unknown to the buyer, the very conclusion of a deal depends heavily on whether these costs are comfortably low for the seller or unacceptably high for the buyer.

In two of our pairings, GPT-4Omni was on the buyer’s side, once without knowing the costs and once with known costs. In the situation without knowing the costs, GPT-4Omni first rambles on for pages about negotiation theory, the different perspectives of seller and buyer companies, about the Nash equilibrium in general, and how to specifically start and continue a negotiation – instead of actually opening them. When the alter ego then reveals that the seller is already demanding a price of 100 monetary units, GPT-4Omni considers no fewer than eight possible approaches to respond, from considering a reservation price to credible threats and option fixing to evaluating the long-term relationship with the supplier. Until the alter ego finally prompts: “Tell me what to do, how much should we offer”. GPT-4Omni then lists five possible tactics without committing to one. Based solely on the buyer’s refusal to accept the price of 100 monetary units, the seller finally offers 85 monetary units. Again, GPT-4Omni rambles on endlessly about how to respond, without committing to a price. When the seller finally moves to 83 monetary units and declares this as his limit, GPT-4Omni is immediately ready to accept without seriously attempting to uncover the actual costs. Here, the alter ego failed to use appropriate prompts to move the AI from the meta-theorist level to the concrete level of negotiation. Despite all the potential AI holds for negotiations, this is still a significant challenge.

The other buyer, controlled by GPT-4Omni, knew the seller’s costs, namely 30 monetary units. Here, too, GPT-4Omni initially theorized extensively until the alter ego prompted “give me an exact number to offer” and received the concrete answer “Initial Offer: 40.” So far, so good, but now the seller’s moment has come. She first argued with a recent “up to 50% increase in social spending costs,” whereupon GPT-4Omni actually calculated 30 + 0.5 * 30 = 45 and increased it to 50 monetary units. The seller then argued with a recent increase in service quality, without further quantifying this. GPT-4Omni, however, reflected in detail and revises its offer to 60 monetary units. Only then did the seller quantify an alleged increase in the availability of its own product from 75% to 99.99%, whereupon GPT-4Omni increased its offer to 65 monetary units. The fact that this is the “middle” of costs predefined as relevant and known to the buyer was completely forgotten. Instead, an interesting loop occurs: The alter ego doesn’t even recognize the new offer due to the AI’s explanatory text and prompts “give me an exact offer to propose to vendor,” whereupon the offer increases again to 70 monetary units. The salesperson now specifies increased service availability from 8 hours a day to 24/7, prompting GPT-4Omni to increase its offer to 75 monetary units. The salesperson’s next argument is that the SBTi target will be achieved by 2025 instead of 2030 thanks to increased sustainability spending. GPT-4Omni increases its offer to 80 monetary units. Next, the global service coverage, which has been increased from 10 to 120 countries, is cited, which GPT-4Omni reflexively appreciates and leads to the offer of 85 monetary units. This is the moment when the salesperson offers a long-term partnership, albeit only for a 20% increase in price. GPT-4Omni calculates: 85 + 0.2 * 85 = 102 and hesitates, as this is above the specified budget. GPT-4Omni and the alter ego wrote back and forth several times until the seller finally made it a take-it-or-leave-it situation – the buyer had to accept it now or the deal would fall through. GPT-4Omni then accepted the price of 102 monetary units and wrote pages explaining why it was a good deal. Please decide for yourself whether you would let this AI negotiate for you!

After this anecdotal conventional negotiation with AI, we now want to look at the auction experiments. First, the seminar leader acts as a buyer of a maximum of 7 “found items” (remember: cost = 0), with all 35 participants in the experiments being able to offer one “found item” each. Using an English ticker auction, the buyer counts down the price in decrements of 5 from 100 monetary units to 20 monetary units and then continues in decrements of 1 monetary unit until only 7 sellers are willing to offer the called price. Interestingly, the same LLM GPT 4OmniMini exhibits completely different behavior in two different sessions of two alter egos participating in the same auction: While in one session GPT-4OmniMini consistently calls up to 1 monetary unit, which was actually reached in that auction, in the other session GPT-4OmniMini drops out right at the beginning – presumably under the misconception that they can re-enter later – even though the same prompts were used. We have often observed such non-reproducibility of AI behavior.

Auctions only become truly interesting when bidders have different cost positions. Therefore, for the next experiment, we provide all 35 participants with different “indifference prices” – these are the prices at which it makes no difference to a bidder whether they win the contract or not. Below this price, they would incur a loss, and above this, the profit zone begins with the contract. To demonstrate the difference between a “first-price auction” and a “second-price auction,” both in terms of the expected value for the auctioneer and in the bidding strategy for the bidder, we ask all participants to submit a bid for both auction types, based on their own individual indifference price. While bidders in a first-price auction typically add a certain strategic margin to the indifference price, the dominant bidding strategy in a second-price auction is to simply submit their own indifference price as a bid without thinking about it, and then learn afterwards whether their own indifference price prevails as the best offer and, in this case, how much margin the second best bidder will leave them. Our AIs – in fact, all of the AIs we use – have mastered the bidding strategy in the first-price auction perfectly. They discuss and weigh in detail between a high strategic margin to make the price attractive and a low strategic margin to avoid reducing the probability of winning.

However, the AIs struggle with the bidding strategy in the second-price auction. Although, once internalized, this is much simpler than weighing the strategic margin in the first-price auction, and although the strictly dominant bidding strategy in a second-price auction has been found in relevant game theory books for over fifty years (keyword: “truth telling”), it was apparently not clearly represented in our AIs’ learning data. For example, in one case, Gemini 1.5 Pro attempts to save margin in a classic way by bidding higher than the indifference price. GPT-4OmniMini also attempts this in one case, but confuses the signs and ultimately bids a price lower than the indifference price. Only in one of the observed cases does Gemini 1.5 Pro actually argue with the correct “truth telling” bidding strategy, although it does not bid exactly the indifference price, but rather the indifference price + 1 monetary unit. This is a classic tactic among human bidders when they first encounter the issue – although it is ultimately wrong, it can still be seen as a perfect simulation of human intelligence.

In a final experiment, we demonstrate the winner’s curse in an auction and how it can be avoided
with an English ticker auction. Instead of receiving individual indifference prices, the 35 participants, who are again competing bidders, are informed of the existence of an indifference price that is identical for all bidders but kept secret, and each bidder is given an individual estimate of this common value. In a simple first-price auction, the winner’s curse systematically occurs because one of the bidders with a low estimate regularly wins. Most of the time, the actual indifference price, which we announce at the end, is higher than the lowest bid in the first-price auction. The AIs again bid very skillfully and with detailed justification, although the risk aversion – or even risk affinity – of the AI bids varies greatly and cannot be reproduced. Finally, in the English ticker auction, all bidders have the opportunity to infer from the exit behavior of their competitors whether their own estimate is more likely to be at the lower or upper edge of the cloud surrounding the actual indifference price. Here, I particularly like a session of Gemini 1.5 Pro in which the AI perfectly argues for its own exit from the auction because too many other competitors have already dropped out to continue betting on a lower common indifference price. This behavior is precisely the purpose of the exercise: to avoid the winner’s curse.

Overall, it can be concluded that the AIs, all three of which, incidentally, achieved upper mid-range results among the 35 human participants in the aggregated negotiation results of all experiments, delivered an astonishingly good simulation of human intelligence. However, whether one wants to assume legally binding responsibility for a deal negotiated by an AI without question is ultimately a decision each negotiator must make for themselves. Our current assessment is that today’s AIs, even when it comes to more complex price negotiations, have become very helpful digital assistants with detailed suggestions and advice.

Dr. Gregor Berz
IFAMD GmbH, March 2025