Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Oct 16, 2023 15:02:48 GMT -5
Here is an interesting video with someone who is very intimate with a room and setup. www.youtube.com/watch?v=GTwwvY8Is1oWhile I don't always agree with what Danny says or agree that speakers need to be ruler flat to sound great, it is interesting that he got 8 out of 10 with difficult tracks. I've personally have tried some of the online compression vs non compressed audio blind test and didn't do too well, but I also felt the tracks were of low quality non commercial open source, so I felt it was a fixed wash to start with. Generally, once I have a source I'm happy with, I don't care and I don't really critique and try and overly think it too much. At some point, you stop listening to music and you can drive yourself mad. The only album I've tried two different sources from is Diana Krall, Turn Up the Quiet. I think the SACD sounds better than the Apple Lossless version. They both sound great, but SACD sounds a bit more revealing and seems to have better soundstage, but I obviously can't blind test myself. I was expecting them to pretty much sound the same. Another issue with blind testing is testers in an unknown room, listening to unfamiliar music, under pressure. While I get and understand why people do it, from some of the papers and links I've read over the years, I always felt that the study had a BIAS to begin with.
|
|
|
Post by 405x5 on Oct 16, 2023 15:28:25 GMT -5
Here is an interesting video with someone who is very intimate with a room and setup. www.youtube.com/watch?v=GTwwvY8Is1oWhile I don't always agree with what Danny says or agree that speakers need to be ruler flat to sound great, it is interesting that he got 8 out of 10 with difficult tracks. I've personally have tried some of the online compression vs non compressed audio blind test and didn't do too well, but I also felt the tracks were of low quality non commercial open source, so I felt it was a fixed wash to start with. Generally, once I have a source I'm happy with, I don't care and I don't really critique and try and overly think it too much. At some point, you stop listening to music and you can drive yourself mad. The only album I've tried two different sources from is Diana Krall, Turn Up the Quiet. I think the SACD sounds better than the Apple Lossless version. They both sound great, but SACD sounds a bit more revealing and seems to have better soundstage, but I obviously can't blind test myself. I was expecting them to pretty much sound the same. Another issue with blind testing is testers in an unknown room, listening to unfamiliar music, under pressure. While I get and understand why people do it, from some of the papers and links I've read over the years, I always felt that the study had a BIAS to begin with. Two Geeks sitting at a table with audiophile hoses instead of wires……Not interesting just the old ploy over again.
|
|
KeithL
Administrator
Posts: 10,261
|
Post by KeithL on Oct 17, 2023 9:23:41 GMT -5
I figured I'm about due to throw a few things out there...
1. Bias does work BOTH WAYS. In other words, we can be biased to hear a difference if we know which product is which, which is why blind testing is useful... But we can also be biased to NOT hear a difference between different products (like cables)... (And I don't know a good way to avoid a bias to not hear a difference between two things.)
2. There is a fair argument, especially with the sorts of differences you get with modern technologies like DACs, for using familiar content. It's not at all unusual for me to notice small but perceptible and repeatable differences between DACs... But which I can ONLY hear with a certain few recordings... or only on electrostatic headphones but not on speakers... The fact that those differences are both audible and repeatable means that they are obviously real... But I almost certainly wouldn't hear or notice them with different content... or different speakers... or maybe in a different room.
3. You must be VERY careful, especially when discussing "group tests and surveys", that the actual study was both blind and DOUBLE blind, and properly conducted. (And there is a LOT more to that last bit than many people seem to realize.) The study needs to be double blind to prevent whoever is conducting the study from "telegraphing their expectations or knowledge" to the participants. This can happen either accidentally or intentionally.
For example certain cable manufacturers are quite notorious for conducting public demonstrations where they very clearly manipulate the audience. This manipulation can take the form of "direct cheating" - like changing the level of what's playing between samples... Or it can be more subtle - like "asking for a show of hands" - then waiting longer with one sample than with the other. (It's pretty well known that, in a room full of people, if a few people raise their hands, then you wait a while, and perhaps ask again, more hands will go up.) (Likewise, you can manipulate people to be more likely to report a difference if you ask them "Which one did you prefer?" instead of "Did you hear a difference?") There are also even more subtle ways in which groups can be manipulated or controlled. (For example, in certain types of tests, people are statistically more likely to prefer "the first one" or "the last one" or "the cup on the right".) You could even suggest that the very act of "setting up an audiophile comparison of several products" produces a bias to expect a difference... (You can easily demonstrate this by setting up such a comparison... and using THE SAME PRODUCT FOR ALL THE TESTS... people will routinely "prefer one or the other".)
4. Any sort of blind testing, using a reasonably sized sample (statistically), is going to be complicated and expensive. You need a LOT of people, a LOT of samples, a LOT of test runs, and probably several different sets of test equipment. For example, to properly compare "DACs" you would need... - a few dozen participants (preferably of different demographic groups) - a few dozen products (of different TYPES) - several separate test systems (at least two or three different speakers and amps, two or three different headphones, and perhaps ear buds) - several different test samples each of several different genres (from heavy metal to acoustic jazz)
Just look at the size and cost of studies to evaluate the efficacy of new drugs... (Real ones, that need FDA approval, not unregulated "supplements" and such...) NO audio company has the budget, or the inclination, to do that sort of testing.
5. Another thing that is quite obvious, and that many people tend to overlook, is the INTENTION of the test. For example, if I'm testing "whether ANYONE can hear a difference", then a single participant who can consistently do so will "prove the point"... Whereas, if I'm testing "whether customers will notice the difference", then I'm looking for a majority result, or a "significant" result...
For example, when Sony was testing "whether the 16/44k CD format was of sufficiently good quality for consumer music" distribution.... They would have been looking for a result of whether MOST people heard no difference between the copy and the master MOST of the time... So, in that test result, it would have been "insignificant" if ONE guy consistently heard a difference, with a few songs, on a certain pair of headphones. And, in fact, they would probably have considered it meaningless if 5% of the participants heard a difference 5% of the time... But, in contrast, the manufacturer of a high-end DAC, or expensive speakers, might be quite pleased if 5% of listeners "hear a significant difference".
6. There is yet another sort of bias that finds its way into purchases of products like cables... It's called "the sunk cost fallacy"... The basic idea is that, once you have "invested" in something, you are biased to "find" that you have "made a good investment". (The old saying about this was "throwing good money after bad"...) So, if you purchase a $500 cable, you would really PREFER not to "decide" that you'd wasted your money. What most people don't realize is that, even if you have a return option on that cable, you have still "invested" an expectation that it will be an improvement. You "don't want to feel stupid for wasting your time"... and that is magnified if you've told a few friends about "the great new upgrade you have on the way". In fact, you may even want to avoid "feeling gullible for believing the marketing literature". All of these sentiments collectively add up to a strong bias to imagine subtle differences that really aren't there...
7. And yet another fallacy is "different = better". For example, it can be fairly said that "no two speakers sound exactly alike", but that tells us nothing about which is "better". And we need to avoid the obvious bias to assume that "the more expensive one is better"... And we also need to be careful to avoid being overly biased by explanations "about why we should expect one to sound better". (This is especially true with speakers and cables, where manufacturers frequently tout "fancy new high-tech materials", so we'll EXPECT them to sound better.) (The same is true when an amplifier manufacturer insists that their amp, with a THD of 0.000001%, will "obviously" sound better than one with a spec of 0.001%.) In both cases they are working to establish a bias for you to expect to hear a difference. But you must also avoid the possibility that, knowing this "ploy", you will become biased to NOT hear a difference that actually exists. (And, of course, double-blind testing is the solution to THAT problem.)
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Oct 17, 2023 10:10:14 GMT -5
I figured I'm about due to throw a few things out there... 1. Bias does work BOTH WAYS. In other words, we can be biased to hear a difference if we know which product is which, which is why blind testing is useful... But we can also be biased to NOT hear a difference between different products (like cables)... (And I don't know a good way to avoid a bias to not hear a difference between two things.) 2. There is a fair argument, especially with the sorts of differences you get with modern technologies like DACs, for using familiar content. It's not at all unusual for me to notice small but perceptible and repeatable differences between DACs... But which I can ONLY hear with a certain few recordings... or only on electrostatic headphones but not on speakers... The fact that those differences are both audible and repeatable means that they are obviously real... But I almost certainly wouldn't hear or notice them with different content... or different speakers... or maybe in a different room. 3. You must be VERY careful, especially when discussing "group tests and surveys", that the actual study was both blind and DOUBLE blind, and properly conducted. (And there is a LOT more to that last bit than many people seem to realize.) The study needs to be double blind to prevent whoever is conducting the study from "telegraphing their expectations or knowledge" to the participants. This can happen either accidentally or intentionally. For example certain cable manufacturers are quite notorious for conducting public demonstrations where they very clearly manipulate the audience. This manipulation can take the form of "direct cheating" - like changing the level of what's playing between samples... Or it can be more subtle - like "asking for a show of hands" - then waiting longer with one sample than with the other. (It's pretty well known that, in a room full of people, if a few people raise their hands, then you wait a while, and perhaps ask again, more hands will go up.) (Likewise, you can manipulate people to be more likely to report a difference if you ask them "Which one did you prefer?" instead of "Did you hear a difference?") There are also even more subtle ways in which groups can be manipulated or controlled. (For example, in certain types of tests, people are statistically more likely to prefer "the first one" or "the last one" or "the cup on the right".) You could even suggest that the very act of "setting up an audiophile comparison of several products" produces a bias to expect a difference... (You can easily demonstrate this by setting up such a comparison... and using THE SAME PRODUCT FOR ALL THE TESTS... people will routinely "prefer one or the other".) 4. Any sort of blind testing, using a reasonably sized sample (statistically), is going to be complicated and expensive. You need a LOT of people, a LOT of samples, a LOT of test runs, and probably several different sets of test equipment. For example, to properly compare "DACs" you would need... - a few dozen participants (preferably of different demographic groups) - a few dozen products (of different TYPES) - several separate test systems (at least two or three different speakers and amps, two or three different headphones, and perhaps ear buds) - several different test samples each of several different genres (from heavy metal to acoustic jazz) Just look at the size and cost of studies to evaluate the efficacy of new drugs... (Real ones, that need FDA approval, not unregulated "supplements" and such...) NO audio company has the budget, or the inclination, to do that sort of testing. 5. Another thing that is quite obvious, and that many people tend to overlook, is the INTENTION of the test. For example, if I'm testing "whether ANYONE can hear a difference", then a single participant who can consistently do so will "prove the point"... Whereas, if I'm testing "whether customers will notice the difference", then I'm looking for a majority result, or a "significant" result... For example, when Sony was testing "whether the 16/44k CD format was of sufficiently good quality for consumer music" distribution.... They would have been looking for a result of whether MOST people heard no difference between the copy and the master MOST of the time... So, in that test result, it would have been "insignificant" if ONE guy consistently heard a difference, with a few songs, on a certain pair of headphones. And, in fact, they would probably have considered it meaningless if 5% of the participants heard a difference 5% of the time... But, in contrast, the manufacturer of a high-end DAC, or expensive speakers, might be quite pleased if 5% of listeners "hear a significant difference". 6. There is yet another sort of bias that finds its way into purchases of products like cables... It's called "the sunk cost fallacy"... The basic idea is that, once you have "invested" in something, you are biased to "find" that you have "made a good investment". (The old saying about this was "throwing good money after bad"...) So, if you purchase a $500 cable, you would really PREFER not to "decide" that you'd wasted your money. What most people don't realize is that, even if you have a return option on that cable, you have still "invested" an expectation that it will be an improvement. You "don't want to feel stupid for wasting your time"... and that is magnified if you've told a few friends about "the great new upgrade you have on the way". In fact, you may even want to avoid "feeling gullible for believing the marketing literature". All of these sentiments collectively add up to a strong bias to imagine subtle differences that really aren't there... 7. And yet another fallacy is "different = better". For example, it can be fairly said that "no two speakers sound exactly alike", but that tells us nothing about which is "better". And we need to avoid the obvious bias to assume that "the more expensive one is better"... And we also need to be careful to avoid being overly biased by explanations "about why we should expect one to sound better". (This is especially true with speakers and cables, where manufacturers frequently tout "fancy new high-tech materials", so we'll EXPECT them to sound better.) (The same is true when an amplifier manufacturer insists that their amp, with a THD of 0.000001%, will "obviously" sound better than one with a spec of 0.001%.) In both cases they are working to establish a bias for you to expect to hear a difference. But you must also avoid the possibility that, knowing this "ploy", you will become biased to NOT hear a difference that actually exists. (And, of course, double-blind testing is the solution to THAT problem.) A very informative post. Thank you. Regarding #5. I recall the coat hanger vs Monster speaker wire test. Apparently only 12 testers and zero mention of the speakers or gear used. For something like that, I feel like the goal was to bash Monster cables. Sure, they are easy to make fun of, but they started a trend of nicer looking cables. Now Parts Express, MonoPrice and others have cheap good cables. I recall another test comparing an expensive amp to a cheap Behringer. I think they threw in cheap RCA cables in as well. The test speakers were nice bookshelf speakers, but likely not efficient or revealing (low distortion) enough or able to fill the room well and likely screwed the results. The same test with say modern Forte IVs could give completely different results. I felt both of those test were not done well, but they have been constantly referenced as 100% proof. #7 I absolutely agree with and I wish everyone audiophile, HiFi nerd knew this. Always be weary of anyone making claims about large night and day differences about anything. A good clean amp with lots of headroom to reach the desired listening levels should just stay in the system. I'm sure a different good amp that might sound just a tad different can be found and liked at first as an improvement, but what amp is actually reproducing the signal path more correctly? If thousands are involved, the slight difference isn't worth it.
|
|
|
Post by leonski on Dec 29, 2023 1:38:13 GMT -5
It is very difficult to do a True Double Blind test. Each listener must be assured the same experience as every other listner.
And you don't just grab the first dozen volunteers who enter the building. You must PRE test to see if they can hear obvious differences
when they are vetted prior ot the final survey.....
As for AMPS? Of course they CAN sound different. A very high damping factor amp hooked to old-school big box speakers (think BOZAK!)
will sound much different than the same speakers with the type of amp intended for it....that being a low DF tube amp, probably.
Amps ALSO vary in their ability to drive various reactive loads. And as it heads off the 'cliff of failure', they sound WILL vary substantially.
And as to Keiths #7 point, above? IF you know what from what, you can be subject to various bias expectations. but if REALLY blind? You can
hopefully render and honest option of better......
And as for Keiths #5 point? the way out of that forest is STATISTICS. 1 in 10 is probably meaningless. But address my point of PRE screening participants.
If a random group? 1 proves nothing.
All this stuff devolves into 'The 3rd Rail Of Audio'.......touch at your own risk
|
|
|
Post by audiobill on Dec 29, 2023 11:11:12 GMT -5
Per Keith:
"And yet another fallacy is "different = better". For example, it can be fairly said that "no two speakers sound exactly alike", but that tells us nothing about which is "better". And we need to avoid the obvious bias to assume that "the more expensive one is better"... And we also need to be careful to avoid being overly biased by explanations "about why we should expect one to sound better". (This is especially true with speakers and cables, where manufacturers frequently tout "fancy new high-tech materials", so we'll EXPECT them to sound better.) (The same is true when an amplifier manufacturer insists that their amp, with a THD of 0.000001%, will "obviously" sound better than one with a spec of 0.001%.) In both cases they are working to establish a bias for you to expect to hear a difference. But you must also avoid the possibility that, knowing this "ploy", you will become biased to NOT hear a difference that actually exists. (And, of course, double-blind testing is the solution to THAT problem.)"
Amen.
|
|
KeithL
Administrator
Posts: 10,261
|
Post by KeithL on Dec 29, 2023 12:19:04 GMT -5
You bring up an interesting point... EVERYONE has a "bias" and/or an "ulterior motive". - The guys who make those expensive cables hope to prove that they really sounds better (to justify the price). Or they hope to at least convince you that there is a good enough chance of that to justify giving their product a try. - The guys who make the cheap cable hope to prove that it sounds just as good (although they probably have less budget to spend to do so). And, to be honest, in the market they sell to, beating their nearest competitor by a few dollars is probably worth more than a cool study. - And the guy who bought those expensive cables wants to convince himself that he made a good investment. - While the guy who bought the cheap cables wants to believe that he was "the smarter shopper". - And, obviously magazines that take ads are motivated NOT to embarrass their current or potential future advertisers or their readers. - But how about the "reader supported audiophile magazine"? THEY also have a strong motivation... to remain relevant. Having a reputation for being honest is great... but not if you don't have anything interesting to say. They would lose their readership very quickly if they said: "Yup... they all came out about the same... again" too many times. (People love to be reassured that the low cost thing they bought was as good as the expensive one they didn't... but it does get boring eventually.) You also need to consider "the use case stuff"... For example... would it make sense for me to pay $500 more for a DAC that I think clearly sounds better... If I can only hear the difference with three or four songs... and only on my best electrostatic headphones... and only when the room is quiet? I guess the answer to that sort of depends on how often I want to listen to those songs... And how often I actually listen to those headphones... I personally find electrostatic headphones far and away the most revealing... And I also find them to sound the best and most enjoyable... Unfortunately, I just don't like listening with headphones, because I find them ALL uncomfortable to some degree, so I rarely do so much of the time. And all of that clearly makes a huge difference in MY priorities. I should also add that, in my experience, the SACD version of an album is quite often mastered differently than the CD versions. Presumably they are often "mastered more to appeal to what audiophiles like". And that can even be true with the SACD and Red Book layers on the same Hybrid SACD (they are usually "the same" but not always) And, to be quite blunt, unless you ripped it yourself, you have no idea where that Apple LOSSLESS version came from. Many streaming services make "adjustments" like level normalization and even MQA processing... which both alter the sound... And even different copies of the CD, mastered and released at different times, or on different labels, can sound different (sometimes significantly). Here is an interesting video with someone who is very intimate with a room and setup. www.youtube.com/watch?v=GTwwvY8Is1oWhile I don't always agree with what Danny says or agree that speakers need to be ruler flat to sound great, it is interesting that he got 8 out of 10 with difficult tracks. I've personally have tried some of the online compression vs non compressed audio blind test and didn't do too well, but I also felt the tracks were of low quality non commercial open source, so I felt it was a fixed wash to start with. Generally, once I have a source I'm happy with, I don't care and I don't really critique and try and overly think it too much. At some point, you stop listening to music and you can drive yourself mad. The only album I've tried two different sources from is Diana Krall, Turn Up the Quiet. I think the SACD sounds better than the Apple Lossless version. They both sound great, but SACD sounds a bit more revealing and seems to have better soundstage, but I obviously can't blind test myself. I was expecting them to pretty much sound the same. Another issue with blind testing is testers in an unknown room, listening to unfamiliar music, under pressure. While I get and understand why people do it, from some of the papers and links I've read over the years, I always felt that the study had a BIAS to begin with.
|
|
|
Post by leonski on Dec 30, 2023 1:37:37 GMT -5
You bring up an interesting point... EVERYONE has a "bias" and/or an "ulterior motive". - The guys who make those expensive cables hope to prove that they really sounds better (to justify the price). Or they hope to at least convince you that there is a good enough chance of that to justify giving their product a try. - The guys who make the cheap cable hope to prove that it sounds just as good (although they probably have less budget to spend to do so). And, to be honest, in the market they sell to, beating their nearest competitor by a few dollars is probably worth more than a cool study. - And the guy who bought those expensive cables wants to convince himself that he made a good investment. - While the guy who bought the cheap cables wants to believe that he was "the smarter shopper". - And, obviously magazines that take ads are motivated NOT to embarrass their current or potential future advertisers or their readers. - But how about the "reader supported audiophile magazine"? THEY also have a strong motivation... to remain relevant. Having a reputation for being honest is great... but not if you don't have anything interesting to say. They would lose their readership very quickly if they said: "Yup... they all came out about the same... again" too many times. (People love to be reassured that the low cost thing they bought was as good as the expensive one they didn't... but it does get boring eventually.) You also need to consider "the use case stuff"... For example... would it make sense for me to pay $500 more for a DAC that I think clearly sounds better... If I can only hear the difference with three or four songs... and only on my best electrostatic headphones... and only when the room is quiet? I guess the answer to that sort of depends on how often I want to listen to those songs... And how often I actually listen to those headphones... I personally find electrostatic headphones far and away the most revealing... And I also find them to sound the best and most enjoyable... Unfortunately, I just don't like listening with headphones, because I find them ALL uncomfortable to some degree, so I rarely do so much of the time. And all of that clearly makes a huge difference in MY priorities. I should also add that, in my experience, the SACD version of an album is quite often mastered differently than the CD versions. Presumably they are often "mastered more to appeal to what audiophiles like". And that can even be true with the SACD and Red Book layers on the same Hybrid SACD (they are usually "the same" but not always) And, to be quite blunt, unless you ripped it yourself, you have no idea where that Apple LOSSLESS version came from. Many streaming services make "adjustments" like level normalization and even MQA processing... which both alter the sound... And even different copies of the CD, mastered and released at different times, or on different labels, can sound different (sometimes significantly). Here is an interesting video with someone who is very intimate with a room and setup. www.youtube.com/watch?v=GTwwvY8Is1oWhile I don't always agree with what Danny says or agree that speakers need to be ruler flat to sound great, it is interesting that he got 8 out of 10 with difficult tracks. I've personally have tried some of the online compression vs non compressed audio blind test and didn't do too well, but I also felt the tracks were of low quality non commercial open source, so I felt it was a fixed wash to start with. Generally, once I have a source I'm happy with, I don't care and I don't really critique and try and overly think it too much. At some point, you stop listening to music and you can drive yourself mad. The only album I've tried two different sources from is Diana Krall, Turn Up the Quiet. I think the SACD sounds better than the Apple Lossless version. They both sound great, but SACD sounds a bit more revealing and seems to have better soundstage, but I obviously can't blind test myself. I was expecting them to pretty much sound the same. Another issue with blind testing is testers in an unknown room, listening to unfamiliar music, under pressure. While I get and understand why people do it, from some of the papers and links I've read over the years, I always felt that the study had a BIAS to begin with. Keith. You wander all over the place. Simple question? Can you run a DBT test or not? I'd say the difficulties overwhelm 95% of all potential testers. I would personally start with testing people to distinguish fine differences....than go on with a 'panel'.... And WHICH of the giant number of (for just ONE example) edits / mixes / formats of DARK SIDE OF THE MOON is best? Some versions are quite valuable to collectors while others get poor reviews. Some other 'remastered' albums were subject (victims, really) of the Loudness Wars and sound like crap. I once experimented with Absolute Phase to no result. Some material did sound different (your right about is that 'better' or not) while much was indistinguishable. I left the switch alone after that. NAD1700......
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on Jan 1, 2024 9:45:26 GMT -5
I think a genuine DBT would take hundreds of people over a longer period of time and mimic the standards of double blind medical testing. I'm not aware of anyone who's done longer term testing with a lot of people. Then again, cables, amps, compressed vs non compressed probably don't warrant the expense and resources to do a really good test.
Everyone has different gear and music collections and personal beliefs as to what speakers, gear, DACs should sound like. Some people like louder compressed music. I prefer more dynamic properly mixed sources, but those recordings don't work very well in a car. The soft parts can't be heard, but on a good home system, completely different experience.
I listen to Apple Music on the go most of the time. Garage tunes are a pair of 2nd gen Andrew Jones Pioneer speakers and a Bluetooth Fosii amp. I don't care that it's only bluetooth when I'm in the garage, and the sound is oddly still good. When I have the time to turn on and listen to my main reference system, I'll consider a SACD, or Classical or Jazz and completely different music than I would even consider playing as background jamming music in the garage or car. I almost have split personalities when it comes to music and audio gear. If I'm listening to my main reference stereo, or favorite cans, I don't play classic rock or 80s pop. That sounds better on my HT setup.
I'm sure most people on this forum have more experience with gear and music than average enthusiasts. I can tell some are older than myself and probably have up to 20 years more experience and likely different experiences than I. We all have had a different path and likely have different levels of hearing capabilities.
Even people who have long term experiences can have (IMO) completely wrong setups or goals. I have a friend who is beyond an adult who still likes his tremble and bass boosted the same way I did when I was in my teens. I can't stand listening to his system, but I can never say that. I've also met people who have a lifetime of HiFi enjoyment who like music WAY louder than I. I honestly can't tell the difference between speakers being over driven and the pain threshold of my ears. A system could be playing back perfectly, but it's starts to sound edgy and harsh to me regardless.
With so many different goals and what people prefer, or what they think is right, or what something should be. Can an audio DBT really be accurate?
|
|
|
Post by 405x5 on Jan 1, 2024 11:00:25 GMT -5
Today’s field of listening options……DBT IMHO, is Long dead, and gone and never meant much anyway.
|
|
|
Post by leonski on Jan 1, 2024 21:31:11 GMT -5
Today’s field of listening options……DBT IMHO, is Long dead, and gone and never meant much anyway. Tend to agree. It is difficult to get it 'right' and rightly or wrongly, objections mount as the conclusions get more controversial..... Besides, trying to apply such results to Me and My system is somewhere between difficult and impossible, anyway......
|
|
|
Post by 405x5 on Jan 1, 2024 21:36:48 GMT -5
Today’s field of listening options……DBT IMHO, is Long dead, and gone and never meant much anyway. Tend to agree. It is difficult to get it 'right' and rightly or wrongly, objections mount as the conclusions get more controversial..... Besides, trying to apply such results to Me and My system is somewhere between difficult and impossible, anyway...... Well, that’s it exactly no surprise I guess that this is a revived thread from 13 years ago
|
|
|
Post by leonski on Jan 7, 2024 15:26:53 GMT -5
|
|
|
Post by 405x5 on Jan 8, 2024 10:07:10 GMT -5
For all the knowledge and sophistication, old ears can’t hear worth a damm compared to youngsters…..always the contradiction when it comes to this stuff.
|
|
KeithL
Administrator
Posts: 10,261
|
Post by KeithL on Jan 8, 2024 14:04:00 GMT -5
First I'm going to answer the question that you asked... Then a few of the ones that you didn't ask... that go with it... And, in case you were wondering, I got very good grades in my science classes... And I spent several years both documenting, and often designing or specifying, test procedures for mil-spec electronics... And I ran the department of an analyst firm that did competitive analysis and testing of computer networking and security products... So I DO know what I'm talking about. But, yes, my only goal here was to hit a few high points - since actually covering all of the details would take a rather thick book. (And I don't claim to know everything about it either.) Yes... you absolutely can perform a valid and meaningful double-blind test. However ALL tests are designed to test specific things... under specific conditions... so your results will always be limited. For example, if you want to test amplifiers, you're going to have to choose a sample of content, speakers, and other associated components to test them with. Therefore you are never going to be able to produce results that will necessarily be true for ALL speakers or ALL content. And there are also always going to be limitations involved with your test population. And there will also be implicit limitations in your test procedure. (Your test subjects may be more sensitive to what you're testing for after their morning coffee than they are late in the day.) You probably won't test those amplifiers, using electrostatic speakers, acoustically recorded lute music, and test subjects who were raised at high altitudes in the Andes mountains... So you'll probably settle for using "popular consumer brand speakers", and "popular music", and "typical test subjects"... But it's an interesting question whether your "typical subjects" will be young college students, middle-aged office workers, or middle-aged jackhammer operators... Or whether they should be trained musicians (who we might hope are more able to detect certain types of flaws)... Or whether it's reasonable to use music that your test subjects are not familiar with... There are even issues with people's tendency about whether to REPORT what they consider to be minor differences. I read a very interesting report many years ago where a large group of people was "sent to wait" in two supposedly identical interconnected waiting rooms... There was an audio system, playing identical content, at identical volume, in each room. At the end of several hours, when the participants were interviewed, none expressed a preference for the audio quality in either room. Yet, at the end of that time, the majority of the participants had ended up in one of the rooms and not the other. The group who conducted the test concluded that "perhaps the participants found the sound in the less-preferred room to be more fatiguing or annoying". (I express no opinion... but that is an interesting result... perhaps worthy of further study.) If you look closely at some of the double-blind tests conducted by those big drug companies... the ones who DO have millions of dollars to spend... You'll find that they are usually still quite limited... For example, when they choose test subjects, they generally exclude people taking many OTHER drugs, or who have OTHER serious but rare medical conditions... (Read the label on almost any drug and you'll find a list of "people who we're not sure can safely take this drug".) So, to answer your question, it is quite possible, and even practical to run a proper double-blind test to test "a short list of specific things". But you really need to set your goals, and spell out all the details, quite carefully. And then you need to be very careful to explain your conclusions and the reasons behind them. Are you testing "whether most people can tell the difference between CD quality and hi-res files, when listening to typical content, on typical home stereo equipment"? Or are you testing "whether at least a few audiophiles can tell the difference, on their own system, with their own favorite recordings"? (I'm sure that the first one was what Sony was looking for in their test.) The real short answers to your question are: It IS quite possible to perform a proper double-blind test. It is MUCH more difficult than you might think to design a double-blind test without obvious serious flaws. I could design a "good" double blind test... but it would be VERY large and VERY expensive to conduct... and it would still have limitations. It is EXTREMELY RARE to read about a double-blind test that DOESN'T have serious flaws or limitations. And that includes ALL of the tests I've read reports about that were conducted by unbiased groups or at AES meetings... (although they tend to be better). The situation is exacerbated by the facts that the details are often not stated... and, when they are stated, many people don't know how to interpret them. And, to be quite blunt, the people who would have the ability and the budget to run high quality double-blind tests rarely have any incentive to do so. (For example audio magazines absolutely do not make enough money to conduct large-scale double-blind tests.) (The only reason drug companies are willing to do so is that they are required to in order to get permission to sell their expensive products.) For example, back when Sony was testing, "whether most people could tell the difference between CD quality and hi-res files"... They concluded that, under specific conditions, including specific content and test gear, members of a specific test group were unable to do so with "statistical significance". These are extremely useful results when determining "whether CD quality will be satisfactory for the vast majority of consumers who purchase music"... But it says virtually nothing about whether a few audiophiles, or musicians, or people with atypical systems or content preferences, will be able to tell the difference. (And, as I recall, they used a relatively small test group, a very limited sample of test gear, and a relatively short list of content material.) And, yes, IF you were testing to try to detect small differences... A good starting point would be to select a panel of test subjects who, upon testing, are found to be above average in their ability to detect those differences. (But, if you're testing a "product", which you plan to sell, you might prefer to select from among people who are typical of the potential customers for your product.) Since you asked... as for Dark Side of the Moon. I always liked the MFSL versions of DSOTM back when they were current (both the vinyl and CD ones). However I prefer the EMI CD remaster from 2003... And I think the two-channel 24/96k version from the Immersion Box Set is probably the best... There was also another non-MFSL 24k gold version that I thought was rather interesting... EMI Japan CP43-5771 (It tends to be quite expensive, and does sound quite interesting, but I think the two I listed above it are better). ................................... Keith. You wander all over the place. Simple question? Can you run a DBT test or not? I'd say the difficulties overwhelm 95% of all potential testers. I would personally start with testing people to distinguish fine differences....than go on with a 'panel'.... And WHICH of the giant number of (for just ONE example) edits / mixes / formats of DARK SIDE OF THE MOON is best? Some versions are quite valuable to collectors while others get poor reviews. Some other 'remastered' albums were subject (victims, really) of the Loudness Wars and sound like crap. I once experimented with Absolute Phase to no result. Some material did sound different (your right about is that 'better' or not) while much was indistinguishable. I left the switch alone after that. NAD1700......
|
|
|
Post by leonski on Jan 9, 2024 2:07:02 GMT -5
First, Keith.....I did my share of experimental design. Completely measurement / statistics based. One thing I learned was that as the number of variables rises.....the number of experiments which must be run rises MUCH faster. In a few cases, we truncated a design and 'fixed' a few variables in order to limit the number of runs needed. In the case of CVD (Chemical Vapor Deposition), each run was hours long. And the equipement needed minor maintenance every 3 or 4 runs....It was a FILTHY process....With all variables in play, the design would have taken months to do.....
If you want to check for ONE thing? You than are up against other walls. Like WHO to empanel as listeners. What degree of significance will you accept as proof? Even the number of listeners per session matters since ALL seats are not created equal.....
I'd Love to see any DBT in audio with statistically valid results. I already stipulated a PRE test / qualification for the panel of listners. Now? We all know of differences among speakers. So finding such would be a waste of time. But what about more 'Third Railish' issues, like various cabling?
My original USA Vinyl pressing of DSOM was I think on the 'good' list. And if in as-issued condition and un opened was worth a lot of $$$......I can't remember the number printed on the album cover.
|
|
KeithL
Administrator
Posts: 10,261
|
Post by KeithL on Jan 9, 2024 11:08:57 GMT -5
Exactly... and, of course, when it comes to HUMAN experience, there are a near infinite number of variables... and "even the variables have variables". And, at lest when you're calibrating hardware, it usually doesn't "get cranky at the end of the day" or "perform better after that second cup of coffee".
For example, my hearing is quite good but, even excluding which I prefer to listen to, there is a clear hierarchy of what playback device enables me to hear the most detail. Old speakers with cone tweeters would be the lowest, followed by dome tweeters, then dynamic headphones, then speakers with folded ribbon tweeters, and at the top electrostatic headphones. And I've seen claims that we humans actually do perform measurably better after a cup or two of coffee... and measurably worse when we're tired.
So the time of day in which each test run was done will make a difference. And it's been shown that humans may tend to be biased towards "the first sample" or "the last sample" or "the sample on the left"...
And at least some people perform far better when they are allowed to use familiar content... And I wonder if at least a few people find that music sounds "more pleasing" on sunny days than on cloudy days...
I actually recall one quite serious test that was done to determine "whether the difference between CD quality and high resolution audio was audible". (I really wish I could remember the details because they were quite interesting.) They used specially made live recordings of some rather obscure instrument that was known for having extremely complex high frequency harmonics... They did a total of several hundred short trials with a few dozen test subjects...
And, out of all those trials, the OVERALL results were that around 55% of the "guesses" were correct... This would seem to suggest that "the number of correct guesses wasn't statistically significant"... BUT, out of several dozen test subjects, there was this one guy who was correct over 80% of the time.
(They even ran extra runs with just that test subject and confirmed that he alone was accurate FAR beyond random chance.)
But what do we DO with those results...?
Apparently, even with this especially "critical" sample content, most people couldn't tell the difference... But we cannot claim that "the difference is inaudible"... because, well, it obviously was audible to at least one of those test subjects.
(And we'll never know how the results would have turned out if they'd used different speakers... or headphones... or done the tests in a different room.)
So, considering the number of different brands and models of speakers in the world... How many do we need to test before we conclude that "something won't be audible on ANY speaker"? Clearly the answer to that would be "a lot more than anyone is actually going to pay to test"... And, likewise, we're not going to convince our subjects to go through the entire test several times... after randomly changing seats. And, if we actually did convince them to do so, the fact that they were getting tired and bored towards the end would also affect the results. (In this situation our decision would almost certainly be to choose "a reasonable number of speakers representative of the most popular types".)
It's also worth pointing out that a lot of the details are not at all intuitive. For example many non-musicians would expect that musicians would be "better able to judge the accuracy of an audio component"... However, according to at least some tests, the opposite is sometimes true... And apparently not all great musicians necessarily gravitate towards expensive or accurate audio gear...
The argument put forth is usually that "many musicians listen more to the music and less to the gear it's playing on"...
Also note that we haven't even mentioned personal preference here... I may prefer a speaker because it provides the best chance for me to hear minute details... But someone else may prefer to hear less emphasis on those details.
I'll close with a bit of trivia...
"in the old days" Stax electrostatic headphones were often used for military SONAR headsets...
Because, whether you find them pleasant to listen to or not, they were very good at revealing detail and "sonic signatures" compared to dynamic headphones.
So, at that time, according to the US Navy, based on that single criterion, they tested out the best (I have no idea about the details of the testing).
First, Keith.....I did my share of experimental design. Completely measurement / statistics based. One thing I learned was that as the number of variables rises.....the number of experiments which must be run rises MUCH faster. In a few cases, we truncated a design and 'fixed' a few variables in order to limit the number of runs needed. In the case of CVD (Chemical Vapor Deposition), each run was hours long. And the equipement needed minor maintenance every 3 or 4 runs....It was a FILTHY process....With all variables in play, the design would have taken months to do..... If you want to check for ONE thing? You than are up against other walls. Like WHO to empanel as listeners. What degree of significance will you accept as proof? Even the number of listeners per session matters since ALL seats are not created equal..... I'd Love to see any DBT in audio with statistically valid results. I already stipulated a PRE test / qualification for the panel of listners. Now? We all know of differences among speakers. So finding such would be a waste of time. But what about more 'Third Railish' issues, like various cabling? My original USA Vinyl pressing of DSOM was I think on the 'good' list. And if in as-issued condition and un opened was worth a lot of $$$......I can't remember the number printed on the album cover.
|
|
KeithL
Administrator
Posts: 10,261
|
Post by KeithL on Jan 9, 2024 11:25:13 GMT -5
Back when I worked for that analyst firm one of the things we did was to develop tests that were intended to be "market biased". In other words, we would figure out what most of the customers wanted, and test various products based on those criteria...
And part of what we produced was something called a Reviewer's Guide... This was targeted, not at "magazine reviewers", but at the guys who would review products, to decide which ones a company should purchase and use. The Reviewer's Guide would walk them through "what was important, how to test it, and what you should expect to see"... (Sort of like a technical infomercial.)
And, yes, this was obviously MARKETING LITERATURE.
For example:
"With a product of this type speed is important."
"Here's the best way to test speed with a product of this sort under real-world operating conditions." "Here are the results you should expect... which will show you that ours is faster than Brand X under real-world conditions".
So, for example, in order to test interconnects...
If I WANT to find that "all interconnects sound the same", then I'll test them using a good quality solid state preamp and power amp. (And I'll present literature to show that most modern gear of that sort operates at relatively low impedance.) But, if I WANT to show that "interconnects DO sound different", then I'll find gear that operates at a high impedance, or that is sensitive to cable capacitance. (You will definitely hear differences between interconnects if you put them between a Moving Magnet cartridge and the inputs of a phono preamp.) And, as long as I'm honest, and tell you all the details, then it's up to you to know which of those results are relevant in your system.
You see this a lot with audio gear... like that one company that is famous for insisting how "shielding against gHz noise is important for audio cables". Then proceeds to show you how their cable really excels in that one area of performance.
First, Keith.....I did my share of experimental design. Completely measurement / statistics based. One thing I learned was that as the number of variables rises.....the number of experiments which must be run rises MUCH faster. In a few cases, we truncated a design and 'fixed' a few variables in order to limit the number of runs needed. In the case of CVD (Chemical Vapor Deposition), each run was hours long. And the equipement needed minor maintenance every 3 or 4 runs....It was a FILTHY process....With all variables in play, the design would have taken months to do..... If you want to check for ONE thing? You than are up against other walls. Like WHO to empanel as listeners. What degree of significance will you accept as proof? Even the number of listeners per session matters since ALL seats are not created equal..... I'd Love to see any DBT in audio with statistically valid results. I already stipulated a PRE test / qualification for the panel of listners. Now? We all know of differences among speakers. So finding such would be a waste of time. But what about more 'Third Railish' issues, like various cabling? My original USA Vinyl pressing of DSOM was I think on the 'good' list. And if in as-issued condition and un opened was worth a lot of $$$......I can't remember the number printed on the album cover.
|
|
|
Post by leonski on Jan 12, 2024 2:21:11 GMT -5
Keith? Your last post just reinforeces my thoughts that DBT is nearly (not completely, but close) worthless.
If I can 'tilt' the playing field any way I want? You'd need a TRIPLE BLIND test where the people running it
and setting it up had NO IDEA, either......straighten it all out after data gathering.
You need to STILL gather test subjects / participants and PRE vet them for some hearing acuity.
And MY house is a virtual Faraday Cage, so using large isolation transformers and all the rest, I have
no worries about gHz Noise. Before I fixed my house? I could FEEL it in my fillings....
|
|