Zero-Shot Chain of Thought Prompting - Overview

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hi everyone today we will be looking at the paper large language models are zero short reasoners so prompting is a way in which you elicit responses from large language models in a few short prompting um you do have certain examples or demonstrations as part of the prompt um here for example you have a reasoning task and you have a question Roger has five tennis balls he buys two more cans of tennis balls and each can has three tennis balls how many tennis balls does he have now and the answer is 11 and then there is the actual question for which we want to respond from the language model where you have a jug look and juggle 16 balls half of the balls are golf balls and half of the golf balls are blue how many blue golf balls are there right you expect the response from the language model so then there is zero short learning where you do not provide the language model with any demonstrations or examples so here uh this is this illustrates zero shot learning you just have the question from here and then you have another string like the answer is and you expect the language model to come up with the answer right so uh recently there was um another paper called Chain of Thought reasoning which uh basically said that in large language models you can improve the performance by having the Chain of Thought as well in reasoning tasks right so uh the same question is before right um and this was uh basically a few short Chain of Thought where you do have examples or demonstrations showing what is expected of the language model so here you have the same question from before and then here instead of just giving the answer which is 11 you now have the Chain of Thought or the series of reasoning steps right Roger started with five bars two cans of three tennis balls each is six tennis balls and five plus six is eleven the answer is 11. and then you have your question and then now the language model instead of just spitting out the answer uh like here it also gives you the uh Chain of Thought or the reasoning steps so in this paper they propose a zero shot cot so zero shot is when you don't have any demonstrations or examples so here in zero shot cot you just have the question for which you want a response from the language model and here you insert a reasoning prompt so here in this example the reasoning prompt is let's think step by step and then they find that the language model basically gives the series of reasoning steps right in answering the question so the method is shown here um so you have the question here for a reasoning task they do not have any demonstrations or examples and they they basically use the reasoning prompt here let's think step by step and feed this to the language model and then they get the series of reasoning Steps From The Language model or the Chain of Thought and the answer is inside these it is embedded in this Chain of Thought which is returned by the language model so that answer needs to be extracted so they feed the question and the answer which was generated by the language model to the language model again and they have an answer extraction prompt right and here it is uh therefore the answer is and then the language model basically extracts the answer from from the reasoning steps which was provided here right so um so if you see it basically differs from um the original few short Chain of Thought paper in that there are no demonstrations or examples so crafting a prompt for a different uh data set is very simple it's as simple as just having a reasoning prompt as well as a answer extraction prompt and you do not have to select any examples from the data set you do not have to specify the Chain of Thought reasoning steps for the examples which are selected which makes it much easier to write the prompt for a new data set so in this table they compare zero shot and zero shot cot on tasks reasoning tasks right um for the arithmetic reasoning so they basically uh the tasks which I use here are all reasoning tasks um be it arithmetic or common sense or other reasoning tasks a symbolic reasoning and they find that zero short Chain of Thought uh most of the times is better to just uh vanilla uh zero shot prompting and um they also use different models uh uh for these experiments such as instruct gpt3 uh gpt3 Palm GPT Neo gpt2 oops sorry GPT j t 0 and opt right um so yeah so basically this shows that for reasoning zero shot cot performance is better than zero shot performance right and then they compare on arithmetic data sets which involve reasoning they compare zero short few short and they are method right so here you can see that zero shot to few short few short is performing better than zero shot um but they compare their method which is zero shot cot which shows a big jump in performance from vanilla zero shot and it also performs better than a few short prompting right um and um but but it doesn't perform as good as a few short Chain of Thought right um few short Chain of Thought still beats zero short Chain of Thought um and uh they also have their method uh zero plus few short Chain of Thought where uh before each example before uh specifying the reasoning steps uh for each example or demonstration they append let's think step by step and um they they see some uh performance Improvement uh when they do that at least for the GSM 8K data set it um so basically uh the takeaway from here is that zero shot cot is better than zero shot prompting and few short prompting uh may not be as good as a few short Chain of Thought but um it it's much easier to craft the prompt and you don't have to manually write the Chain of Thought steps for any data set any new data set which you are using um these figures show how um zero shot CRT scales compared to zero shot so um there is a certain threshold I think it's like around 7 billion where you see that uh zero shot cot scales better compared to zero shot prompting and then they do error analysis where they study the uh questions where zero shot cot does not give the right answer and uh they try to look at the different uh logical steps which is uh Reason by the model right so um and there are some interesting observations um so here for example uh where is a values toy car likely to be found um and there are these are multiple choice uh questions uh a child's room a boys bedroom own home toy store house right and and then they ask the language model with the reasoning prompt let's think step by step so uh the response from the model is a toy car is likely to be found in a child's room a child's room is likely to be found in a house therefore toy car is likely to be found in a house so here um therefore among and then you need to extract the right answer right these are the reasoning steps and then you have another prompt you feed in the question answer and another prompt therefore among a through e the answer is and the language model basically returns e which is house so um whereas the ground truth is a child's room um but then the The Logical steps are are right but um yeah there's just some confusion with with more reasoning right at the end and that's why it got it wrong let's look at another misclassified example right where um the question is what would be necessary for getting back in shape and answer choices are good health exercise muscle tone sweat and feel better and let's think step by step so now the language model basically generates the steps uh the Chain of Thought is in order to get in shape you need to be able to do three things exercise regularly eat a healthy diet get enough sleep if you can do all three of those things then you will be on your way to uh getting in shape right and then again you have to feed both these and then the answer extraction prompt therefore among a through e the answer is it basically gives multiple answers p c and d which is eating a study which is exercising muscle tone and sweat right so the right answer is included but it gives multiple answers so the reasoning steps are good whereas uh when it's trying to extract the answer that's when you know the the reasoning steps basically point to multiple different answers um and then they experiment with different kinds of um the templates right um the reasoning prompt uh which we saw before was let's think step by step and that seems to perform really well uh but there could be other templates as well uh like let's be realistic and think step by step uh broadly they categorize it as instructive which AIDS in reasoning or which is basically asking the language model to reason endless steps and then misleading which is not really um a prompt related to reasoning or listing of the steps and irrelevant which are straight up um not relevant to the task right and then the Baseline is zero shot and they find that um instructive prompts perform better than misleading and irrelevant prompts and um and they definitely do perform better than just zero shot right and um in this table they basically compare zero short and few short so um in the few short Chain of Thought um if you recall from the previous uh Chain of Thought reasoning paper uh you need to select certain examples from the data set right um so if it was a math data set reasoning data set you need to select relevant uh examples from it and um whereas in zero shot Chain of Thought you don't really need to do anything you just have to uh use the string which uh like let's think step by step uh which is uh which helps the model in listing out the different reasoning steps and uh here um what they do is let's say you have a task which is multiple choice right you have to answer multiple choice questions and um if you pick examples from a different data set now here um they pick examples from common sense question answer data set but then they use it on um two different uh two other different data sets right and they basically see that the performance is not as good as if you picked examples from the same data set so you see a degradation in performance and that is more pronounced in the arithmetic data set um whereas um even though there is a degradation in performance um providing these Chain of Thought examples are better than uh vanilla is zero shot prompting but then if you look at zero short Chain of Thought where you don't really have to pick any examples you see that there is sort of uh the performances uh is better than the few short cot with mismatched examples but not as good as a few short cot with examples from the same data set um so this this can be uh favored uh in terms of uh the ease of crafting prompts for a new data set and uh yeah that's um that's the paper so the major contribution of this paper is uh this zero short Chain of Thought which is a reasoning prompt um to elicit a response from the language model such that uh there are different uh reasoning steps listed right and this performs better than uh zero shot prompting and few short prompting maybe not as good as a few short Chain of Thought but um but but still it saves the hassle of trying to pick examples from the data set and specify the reasoning steps for each of those examples

Info

Channel: Sherin Muckatira

Views: 672

Rating: undefined out of 5

Keywords:

Id: th1tr9eLIwc

Channel Id: undefined

Length: 14min 34sec (874 seconds)

Published: Mon Jan 23 2023