|
Post by Porscheguy on Aug 2, 2015 16:41:30 GMT -5
|
|
|
Post by knucklehead on Aug 2, 2015 16:44:53 GMT -5
I know... But I ain't sayin...
|
|
klinemj
Emo VIPs
Honorary Emofest Scribe
Posts: 14,756
|
Post by klinemj on Aug 2, 2015 17:18:00 GMT -5
I've elaborated on this topic many times here. I adhere to blind testing but NOT ABX testing. And my #1 reason that I like "blind testing" but do NOT like ABX is not addressed in the article.
My reason for not liking ABX comes from 29 years doing consumer research on consumer preference. I have seen again and again that a 3-way comparison ends up being, statistically and in humans response terms, a blunt tool. A far better tool is a randomly assigned and blind A-B tool.
ABX requires the listener to "sample" three signals...an A, a B and a third signal (X). Then, they must decide and declare if X was A or B. This is hugely challenging for the human brain (which is binary coded). This adds confusion to the brain which lessens the chance of finding a statistical difference.
A FAR better way to judge differences is in a blind A-B test. A blind AB test is much more accurate What is this blimd AB test (vs. an ABX test) and why should I care (you may ask)?
Simple...in a blind A-B test, the subject does not know which signal/stimulus they are hearing (so it is blind). But, they only hear a first signal and a second. Then, they are asked: if they have a preference, then if so are asked if they prefer one over the other, and if so...which they preferred. This is still "blind" BUT has a much better ability to discern differences.
Most do not understand the differences between "blind" testing and the very specific ABX testing.
I am all for blind testing but very against ABX testing.
Let the debates begin...but please, separate blind AB tests from the specific use of the ABX method and the specific audio evaluation tool. My prediction...folks will not separate the tool from the technique because they do not undersatand the difference. The techniques, and thus also the tool, are fundamentally flawed. But, techniques for proper blind testing exist AND should be used.
Mark
|
|
|
Post by Boomzilla on Aug 2, 2015 17:30:45 GMT -5
Mark & I are in complete agreement (a unique experience - certainly we must BOTH be wrong!? ). What he says makes perfect sense. After all, your optometrist doesn't show you a chart and then ask which of two lenses looks most like the chart (the ABX method). It's either A or B - which is better. If you can't tell, or seem mixed up, the test is easily repeated until a consistent result is found. Boomzilla
|
|
harsh
Minor Hero
Posts: 40
|
Post by harsh on Aug 2, 2015 17:40:21 GMT -5
@boom and mark The whole purpose of abx testing is not to determine which one between a and b sounds better. Just to determine if there is an audible difference between the two. If there is an audible difference, next step is to find which one you prefer. If there is no audible difference, then... No need for a next step Edit : and just for the mental buffering argument, you don't have to remember more in abx than in blind ab. You hear a, then b. You notice a difference (or not) Then after hearing b, you hear x. If no difference with b then it's b. If it's not like b, it's a. If no difference between a and b, and b and x, try again.
|
|
|
Post by pedrocols on Aug 2, 2015 17:46:11 GMT -5
@boom and mark The whole purpose of abx testing is not to determine which one between a and b sounds better. Just to determine if there is an audible difference between the two. If there is an audible difference, next step is to find which one you prefer. If there is no audible difference, then... No need for a next step Precisely!
|
|
DYohn
Emo VIPs
Posts: 18,352
|
Post by DYohn on Aug 2, 2015 18:08:54 GMT -5
I also agree with Mark and will add that blind testing of any sort can be a very valuable tool when trying to ascribe cause to preference: when trying to decide why one person or group of people prefer one thing over another, it is important to eliminate as many variables as possible. However, I do think blind testing is the wrong tool when making personal preference decisions. I believe that all of our senses must be satisfied when we make personal choices. So if the color, smell, size, brand name or price of something makes a difference to you, then those factors should be given weight. You can never say A was BETTER than B because of those factors when discussing audio gear, but you might absolutely prefer A over B because of those factors.
|
|
|
Post by garym on Aug 2, 2015 18:40:44 GMT -5
This is hugely challenging for the human brain (which is binary coded) . . . . This adds confusion to the brain which lessens the chance of finding a statistical difference. What evidence do you have for that? On its face it seems questionable. Suppose the listener is presented with a middle C note, a middle A note, and then an X note, which is either the A or C note. Would he have trouble saying which note X was? Or two similar words, such as "cap" and "cat," and then a X word, which is one of the first two. Or the same word uttered by two different speakers. If there is a problem with ABX testing, it is not that humans get confused when trying to remember two different stimuli and compare them with a third. It will be because the differences between the two are hard to distinguish in the first place. There is probably some minimum of difference two signals must have to be distinguishable by a given listener. That, of course, is the point of the technophile/audiophile debate --- the technophiles contend that the differences between a (decent) $300 receiver and a pair of $1000 monoblocks are too tiny to be perceived by most (or any) listeners. ABX and blind A/B testing (as you characterized it) are attempts to measure two different things. The first seeks to detect perceptible differences between two components; the second to reveal preferences between two components. Preferences can depend upon many factors, including idiosyncratic, psychological ones, that have nothing to do with the actual (measurable) quality of the signal presented. E.g., people who prefer a "warmer" sound may prefer speakers that emphasize mid-bass and suppress treble to speakers that are perfectly flat. Blind A/B testing can yield misleading results. Preferences may affected by the order in which the signals are presented (the more recent one being fresher in the mind than the first, it may be preferred). Or some something distracting occurred while listening to one of the samples (including some thought bubbling up in the listener's mind). Before deciding he prefers component A to component B based on blind A/B tests a buyer should be sure to listen, blindly, to each component playing the same signal several times, in different order, and to different materials presented several times in different order.
|
|
|
Post by garym on Aug 2, 2015 18:59:05 GMT -5
After all, your optometrist doesn't show you a chart and then ask which of two lenses looks most like the chart (the ABX method). It's either A or B - which is better. If you can't tell, or seem mixed up, the test is easily repeated until a consistent result is found. Bad analogy! ABX tests attempt to confirm a perceptible difference between two presented signals. The analogy would be for the optometrist to give you two lenses, let you read the chart with both, then present you with a third lens (which is really one of the first two) and have you say which of the first two it is. Whether you like one of them better than the other is a separate question. The first question is a test of the lenses, the second a test of your quirks --- maybe you prefer "warm," fuzzy images to "harsh," sharp ones! (Photographers sometimes use soft-focus lenses to achieve that very result).
|
|
klinemj
Emo VIPs
Honorary Emofest Scribe
Posts: 14,756
|
Post by klinemj on Aug 3, 2015 6:42:40 GMT -5
harsh I fully understand about measuring for difference vs. for preference. And while A/B testing can be done for preference, it can be used to measure if difference exist or not. I have done it in my career many times. And, I disagree that the brain does not have to hold any more info in an A/B/X...it truly does. garym I read a paper on the topic once at work. Our team was debating test techniques for some research and someone had propose what was essentially an ABX. A colleague whose field of study in college was tied to human decision making argued against it, then shared a paper with us that supported his point. FYI to garym and harsh...it is not that the human brain cannot make three way comparisons...it is just that the brain is more efficient and more accurate comparing two. I have seen this play out when we use sort and rate tests where we ask consumer to sort/rate 10 or more things on a scale. The subject does an initial sorting/rate, and they invariably go back and re-sort by comparing 1:1 to finalize and optimize their choices. Mark
|
|
bootman
Emo VIPs
Typing useless posts on internet forums....
Posts: 9,358
|
Post by bootman on Aug 3, 2015 7:51:03 GMT -5
It is because of the constant battle between the id, ego and super ego.
|
|
DYohn
Emo VIPs
Posts: 18,352
|
Post by DYohn on Aug 3, 2015 8:14:22 GMT -5
A/B/X relies on memory, not on comparison.
|
|
|
Post by yves on Aug 3, 2015 9:07:17 GMT -5
The more important question IMO... why do self defined objectivists try to discredit audiophiles by touting it is easy for just about anyone equipped with a pair of normal ears and an ABX tool to obtain reliable evidence in support of the hypothesis that two sounds are audibly the same?
Here's what I am referring to:
From ITU BS.1116-1 "METHODS FOR THE SUBJECTIVE ASSESSMENT OF SMALL IMPAIRMENTS IN AUDIO SYSTEMS INCLUDING MULTICHANNEL SOUND SYSTEMS"
6. Programme material Only critical material is to be used in order to reveal differences among systems under test. Critical material is that which stresses the systems under test. There is no universally “suitable” programme material that can be used to assess all systems under all conditions. Accordingly, critical programme material must be sought explicitly for each system to be tested in each experiment. The search for good material is usually time-consuming; however, unless truly critical material is found for each system, experiments will fail to reveal differences among systems and will be inconclusive. It must be empirically and statistically shown that any failure to find differences among systems is not due to experimental insensitivity because of poor choices of audio material, or any other weak aspects of the experiment, before a “null” finding can be accepted as valid. In the extreme case where several or all systems are found to be fully transparent, then it may be necessary to program special trials with low or medium anchors for the explicit purpose of examining subject expertise (see Appendix 1). These anchors must be known, (e.g. from previous research), to be detectable to expert listeners but not to inexpert listeners. These anchors are introduced as test items to check not only for listener expertise but also for the sensitivity of all other aspects of the experimental situation. If these anchors, either embedded unpredictably within the context of apparently transparent items or else in a separate test, are correctly identified by all listeners in a standard test method (§ 3 of this Annex) by applying the statistical considerations outlined in Appendix 1, this may be used as evidence that the listener’s expertise was acceptable and that there were no sensitivity problems in other aspects of the experimental situation. In this case, then, findings of apparent transparency by these listeners is evidence for “true transparency”, for items or systems where those listeners cannot differentiate coded from uncoded versions.
|
|
novisnick
EmoPhile
CEO Secret Monoblock Society
Posts: 27,230
|
Post by novisnick on Aug 3, 2015 9:11:15 GMT -5
A/B/X relies on memory, not on comparison. What was the question?
|
|
DYohn
Emo VIPs
Posts: 18,352
|
Post by DYohn on Aug 3, 2015 9:41:43 GMT -5
In my experience, A/B/X testing results are almost always statistically equivalent to flipping a coin.
|
|
KeithL
Administrator
Posts: 9,958
|
Post by KeithL on Aug 3, 2015 10:18:15 GMT -5
Indeed it does. An ABX test relies on several things: 1) You must be able to hear a difference. 2) You must be able to "quantify" that difference in such a way that it allows you to recognize one or the other which allows you to..... 3) Identify which reference sample your "unknown" matches. Now, in all fairness, you might be willing to convince me that this will indicate whether "there is a 'significant' difference or not", but NOT whether there is an audible difference. I'm going to resort to a visual analogy. Lets assume we start with three colored tiles, and we use the "standard ABX test methodology" - meaning that we show the tiles, one after the other. Using this methodology, you will be able to determine some minimum threshold below which most viewers are totally unable to recognize which of two reference tiles your unknown sample matches. However, you will almost certainly find that, if you actually start placing tiles side by side, at the same time, virtually all of your test subjects will be able to identify a MUCH smaller difference in colors. In other words, it has been pretty widely shown that our ability to recognize colors from memory is far less accurate than our ability to discern differences between colors WHEN THEY ARE DISPLAYED SIDE BY SIDE AT THE SAME TIME. A proposed "audible difference test".There is actually a simple test which COULD be used to tell whether such a tiny difference exists between two audio samples or not....... but I'm not aware of its being used. Set up a switch box, and two sources - we'll call them "A" and "B". The switch box will have a single button on it and, when the button is pressed, a slight "click" will be inserted into the audio (perhaps the signal will drop for 0.1 seconds). And, depending on how the particular run is configured, the button may also trigger a switch of the input or component under test - or not. So, when our subject presses the button, they will hear the same music, but the source or component under test may change. (The subject gets to press the button, and to listen to each sample as long as they like, and even to choose the sample music they like.) When our subject presses the button: (click; click; click) they may hear (A click A click A click A), or (B click B click B click B), or (A click B click A click B). (The test could be made even more sensitive by randomizing the alternating sequence - but I think this way is quite good enough - and it's easier.) Our test set will randomly switch between these three situations, tell the tester which one is used for each run, and we'll record whether our test subject BELIEVES "the source is switching back and forth or not". If the subject can't statistically tell any better than a guess whether he or she is listening to one source repeatedly, or they're switching back and forth, then we will have determined that they are "indistinguishable". Like placing colored tiles next to each other and asking a subject if they see a difference or not, this tests specifically and only whether the signals are "audibly identical". Humans are very good at picking out "random jumps" - like the unmeasurable but audible "click" that you get when you splice two "equal" files containing white noise together. The ticks I inserted every time the switch is pressed make sure that the subject hears a tick every time, and avoids their hearing a tick ONLY when the sample is switched, and not when it isn't. (We've eliminated that as a cue.) A/B/X relies on memory, not on comparison.
|
|
|
Post by yves on Aug 3, 2015 10:34:25 GMT -5
In my experience, A/B/X testing results are almost always statistically equivalent to flipping a coin. ...Except of course if the audible differences are so ridiculously big and obvious that audibly detecting them doesn't require any sort of blind testing whatsoever anyway in the first place. Perhaps one of the main reasons why audiophiles are constantly being flamed by certain types of ABXers has been explained in this article: www.audiostream.com/content/blind-testing-golden-ears-and-envy-oh-my
|
|
KeithL
Administrator
Posts: 9,958
|
Post by KeithL on Aug 3, 2015 10:49:11 GMT -5
First, thanks for posting that reference (which I didn't include here for brevity). I think the answer to your question is simply that it's a matter of overreaction (or "polarization" if you prefer). The audiophile area has become so well known for being one of the last true bastions of snake oil and placebos that a lot of people are somewhat overzealous at determining the truth. They prefer black or white. Either the world is simple (one way) and you should trust only the measurements, or the world is simple (the other way) and you should trust only your ears; the idea that both may be important, and so you have to figure out which is which in each individual case, is much more complicated to figure out... and mush less certain... and people like certainty and hate uncertainty. You find similar polarized opinions about any similar "hot button" topic, from "global warming" to "big pharma" to "the trustworthiness of the government". The more important question IMO... why do self defined objectivists try to discredit audiophiles by touting it is easy for just about anyone equipped with a pair of normal ears and an ABX tool to obtain reliable evidence in support of the hypothesis that two sounds are audibly the same? ............................
|
|
|
Post by yves on Aug 3, 2015 12:26:36 GMT -5
Indeed it does. An ABX test relies on several things: 1) You must be able to hear a difference. 2) You must be able to "quantify" that difference in such a way that it allows you to recognize one or the other which allows you to..... 3) Identify which reference sample your "unknown" matches. Now, in all fairness, you might be willing to convince me that this will indicate whether "there is a 'significant' difference or not", but NOT whether there is an audible difference. I'm going to resort to a visual analogy. Lets assume we start with three colored tiles, and we use the "standard ABX test methodology" - meaning that we show the tiles, one after the other. Using this methodology, you will be able to determine some minimum threshold below which most viewers are totally unable to recognize which of two reference tiles your unknown sample matches. However, you will almost certainly find that, if you actually start placing tiles side by side, at the same time, virtually all of your test subjects will be able to identify a MUCH smaller difference in colors. In other words, it has been pretty widely shown that our ability to recognize colors from memory is far less accurate than our ability to discern differences between colors WHEN THEY ARE DISPLAYED SIDE BY SIDE AT THE SAME TIME. A proposed "audible difference test".There is actually a simple test which COULD be used to tell whether such a tiny difference exists between two audio samples or not....... but I'm not aware of its being used. Set up a switch box, and two sources - we'll call them "A" and "B". The switch box will have a single button on it and, when the button is pressed, a slight "click" will be inserted into the audio (perhaps the signal will drop for 0.1 seconds). And, depending on how the particular run is configured, the button may also trigger a switch of the input or component under test - or not. So, when our subject presses the button, they will hear the same music, but the source or component under test may change. (The subject gets to press the button, and to listen to each sample as long as they like, and even to choose the sample music they like.) When our subject presses the button: (click; click; click) they may hear (A click A click A click A), or (B click B click B click B), or (A click B click A click B). (The test could be made even more sensitive by randomizing the alternating sequence - but I think this way is quite good enough - and it's easier.) Our test set will randomly switch between these three situations, tell the tester which one is used for each run, and we'll record whether our test subject BELIEVES "the source is switching back and forth or not". If the subject can't statistically tell any better than a guess whether he or she is listening to one source repeatedly, or they're switching back and forth, then we will have determined that they are "indistinguishable". Like placing colored tiles next to each other and asking a subject if they see a difference or not, this tests specifically and only whether the signals are "audibly identical". Humans are very good at picking out "random jumps" - like the unmeasurable but audible "click" that you get when you splice two "equal" files containing white noise together. The ticks I inserted every time the switch is pressed make sure that the subject hears a tick every time, and avoids their hearing a tick ONLY when the sample is switched, and not when it isn't. (We've eliminated that as a cue.) A/B/X relies on memory, not on comparison. In your proposed test you forgot to mention the fourth possibility, i.e., B click A click B click A. By assuming that this fourth possibility should be evaluated simultaneously with the third one (i.e., A click B click A click B), you are falling into the common trap of Distinction Bias. en.wikipedia.org/wiki/Distinction_biasIf listening to A click B, listening to A informs the listener about A. Because we have memory, our perception of B will be biased by our recollections of A. Whereas if listening to B click A, listening to B informs the listener about B, and, as a result, our perception of A will be biased by our recollections of B. The bias that occurs if listening to A click B is not necessarily the same bias that occurs if listening to B click A. In order not to fall into the common trap of Distinction Bias, it is imperative that the test results of A click B are analyzed independently of the test results of B click A. This is a key example of why your average ABXer at home will typically be biased towards "hearing no difference". Another good point IMO is that it has been proven elsewhere that, if placed in a stressful situation, humans might not be able to tell differences between quite surprising things. Remember that the ITU specification states you have to empirically and statistically show that any other weak aspects of the experiment are absent before you can even begin to prove the null hypothesis. Because the results of the test might not be in line with the expectations of the test subject, the assumption that human stress has no significant bearing on the test results is an assumption that can be clearly identified as Experimenter's Bias. Finally, the common observation that ABXers tend to place the burden of proof on audiophiles is in shrill contrast to the ITU specification, and in shrill contrast to the whole scientific method in general. These self defined objectivists will even go as far as to claim, without providing any reliable evidence of course, that Change Deafness ( en.wikipedia.org/wiki/Change_deafness ) cannot be triggered by factors that do not normally occur outside the artificial set of circumstances typical of blind testing.
|
|
|
Post by yves on Aug 3, 2015 12:54:15 GMT -5
First, thanks for posting that reference (which I didn't include here for brevity). I think the answer to your question is simply that it's a matter of overreaction (or "polarization" if you prefer). The audiophile area has become so well known for being one of the last true bastions of snake oil and placebos that a lot of people are somewhat overzealous at determining the truth. They prefer black or white. Either the world is simple (one way) and you should trust only the measurements, or the world is simple (the other way) and you should trust only your ears; the idea that both may be important, and so you have to figure out which is which in each individual case, is much more complicated to figure out... and mush less certain... and people like certainty and hate uncertainty. You find similar polarized opinions about any similar "hot button" topic, from "global warming" to "big pharma" to "the trustworthiness of the government". The more important question IMO... why do self defined objectivists try to discredit audiophiles by touting it is easy for just about anyone equipped with a pair of normal ears and an ABX tool to obtain reliable evidence in support of the hypothesis that two sounds are audibly the same? ............................ It's not just about the polarization IMO. Rather, it's probably about being a shill for a company that thrives on cheap inferior mass produced and mass marketed audio products that revolve around low resolution lossy codec music distribution technologies because, after all, if we can trust the bulk of ABX test results that float around the internet, even your average cheap DVD player at Best Buy sounds the same as a dCS Vivaldi.
|
|