Well technically during the twilight right after the sun has disappeared below the horizon, or just before the sun appears from under the horizon (when there is no direct line of sight to the sun), the sky is strictly blue-er: the reason the sun and the neighboring angles in the sky appears "yellow/orange" is because green and especially red scattered less through the atmosphere, while a good portion of blue light scatters much more easily on our atmosphere, allowing non-line-of-sight blue illumination on land where the sun has not yet risen or where the sun has already set.
All of humanity has been a witness to these observations and yet we blindly assume blue light filters must have such and such an effect.
But even if it did: suppose a modern concrete-cave-dweller has an out of phase shifted day/night pattern with respect to solar rhythm, having blue light as the last form of light actually seems more natural!
Is there a reason GPU's don't use insane "blocks" of sdcard slots (for massively parallel io) so the model weights don't need to pass through a limited PCI bus?
Yes. Let's do the math. The fastest sd cards can read at around 300 MB/s (https://havecamerawilltravel.com/fastest-sd-cards/). Modern GPUs use 16 lanes of PCIe gen 5, which is 16x32Gb/s = 512Gb/s = 64 GB/s. Meaning you'd need over 200 of the fastest SD cards. So what you're asking is: is there a reason GPUs don't use 200 SD cards? And I can't think of any way that would work
SD is obviously the wrong interface for this but "High Bandwidth Flash" (stacked flash akin to HBM) is in development for exactly this kind of problem. AMD actually made a GPU with onboard flash maybe a decade ago but I think it was a bit early. Today I would love to have a pool of 50GB/s storage attached to the GPU.
Oh definitely. The AMD past product just stuck 4x m.2 slots onto the board. Today that approach would be 50-60 GB/s read speed which would be useful enough for something that any of the vendors could build with existing components.
One thing to note, those aren't the fastest SD cards, those are the fastest UHS-II SD cards. The future is SD Express and you can already get microSDs at 900 MB/s.
Some years ago I realized that if I had oodles of money to spend I would totally get someone to make a PCIe card with like several hundreds microSD cards on it.
You can buy vertical microSD connectors, so you can stack quite a lot of them on a PCIe card. Then a beefy FPGA to present it as a NVMe device to the host.
Goal total capacity, as you can put 1TB cards in there. And for teh lulz of course.
The next gen inference chips will use High Bandwidth Flash (HBF) to store model weights.
These are made similarly to HBM but are lower power and much higher capacity. They can also be used for caching to reduce costs when processing long chat sessions.
none of this matters, the probability of new physics here is essentially nil.
if anyone is irresponsible its Papp who knowingly handed the plug to a productive skeptic, knowing full well the dangers of the device.
if the motor had any merit, he could refuse the out-of-court settlement and demonstrate the working principles of his motor, but he didn't! he took the money, Feymann inadvertently saved him from certain humiliation had this not occured.
hiding rocket propellant in a perpetuum mobile must be one of the most dangerously retarded things to do, people will come closer and inspect and try to figure out the true power source, and they will home in on your panic reactions to get closer (warmer - colder style).
>the probability of new physics here is essentially nil.
I dont function like that. I have true, false and unknown.
If i prematurely promote an unknown to true or false i would feel the need to defend my uninformed conclusion. Since i dont have what it takes to do that i would try to spoof the data. I would also blind myself to everything that contradicts the opinion.
You, feynman, the septic communty, you have sunk to a level that wouldnt even be legal if the man was alive. Claims of fraud and first degree murder?!?!
Unlike the inventor such claims require evidence and until you have it the accused is innocent.
But lets compare the two cobtradicting stories. One claims the other unplugged the device and claimed he really wanted to hold the plug in his hand and politely asked. The other claimed the one to have unplugged it himself and refused to give it back.
Which one of the two would require settling out of court?
The debunking doesnt really talk about Papp. You can copy paste it under any exotic claim.
>if the motor had any merit, he could refuse the out-of-court settlement and demonstrate the working principles of his motor.
No reason to think any demonstration would change information-free debunking.
Well over a thousand geet engines were build, hundreds of youtube demostrations but it is still not considered real.
There are now respectable publications about the pons and fleishman device with 100% successful replication but it is still not considered real.
If you personally build a working Papp engine it would simply not be considered real. People would say you are a fraud and a murderer and that would be the final word on the topic.
for the simple reason that other energy consuming industries have physical products in and physical products out, which is costly to transport from space, while uploading extra corpus tokens and downloading new weights is essentially free from space (compared to the hardware shipping costs).
people have been correctly indoctrinated about global warming and the dominant heating terms coming from excess CO2 concentration, but because of this over-emphasis they neglect the prompt heating that comes with nearly all energy generation mechanisms (from fossil fuels, to solar panels to nuclear energy).
when the whole world starts raising their living conditions, and when a computational race erupts, there is no taming of total human energy consumption.
but what we can do is offload the bulk of computational energy consumption, like training common goods such as LLM weights...
Maybe because your tone is needlessly aggressive and lot's of people don't want HN turns into Reddit/Twitter. There is enough people raging on everything on all social networks maybe we are here hopping to have more civilised discussions.
> Oh no the burden of actually explaining why you want to de-emphasize a comment.
One single square meter of land in direct sunlight receives a constant 6kW (21MJ) of energy. The heat rejected by industrial and other processes is absolutely minuscule in comparison, a rounding error.
Comments that are incorrect but posted in an authoritative voice get downvoted, for good reason.
>One single square meter of land in direct sunlight receives a constant 6kW (21MJ) of energy. The heat rejected by industrial and other processes is absolutely minuscule in comparison, a rounding error.
This is incorrect, at ground level its about 1 kW of sunlight per square meter if that square meter is orthogonal to the line of sight to the sun, otherwise it gets diminished with cos(theta) where theta is the angle between the line of sight to the sun and the normal of the square meter of land, it can not receive 6 kW no matter the orientation. And 6 kW is a power, while 21 MJ is an energy.
> Comments that are incorrect but posted in an authoritative voice get downvoted, for good reason.
Indeed your incorrect comment in an authoritative voice might get downvoted, for good reason, but I won't be the one doing it...
Welp, turns out I should verify information better. I thought 6kW seemed high when a 1 square meter solar panel that is ~25% efficient can generate 250W of electricity. My apologies.
For me personally, I prefer to perceive as more fundamental those constructions that straightforwardly generate the most with the least of novelty.
For example one may be introduced to the real numbers, and later to the complex numbers and then later perhaps the quaternions and the octonions, etc. in a haphazard disconnected way.
Given just the real numbers and "geometric algebra" (i.e. Grassmann algebras), this generates basically all these structures for different "settings".
I hence view geometric algebra as a lower level and more fundamental than complex numbers specifically.
A more interesting question (if we had to limit discussion to complex numbers) would be the following: which representations of complex numbers are known, what are their advantages and disadvantages, and can novel representation of complex numbers be devised that display less of the disadvantages?
For example, we have -just to list a few- the following representations:
1. cartesian representation of complex numbers: a + bi : the complex numbers permit a straightforward single-valued addition and single-valued multiplication, but only multivalued integer-powered-roots. addition and multiplication change smoothly with smooth changes in the input values, taking N-th roots do not, unless you use multivalued roots but then you don't have single valued roots.
2. polar representation of complex numbers: r(cos(theta)+i sin(theta)): this representatin admits smooth and single valued multiplication and N-th roots, but no longer permits smooth and single valued additions!
Please follow up with your favourite representations for complex numbers, and expound the pro's and con's of that representation.
For example can you generate a representation where addition and N-th roots are smooth and single valued, but multiplications are not?
Can you prove that any representation for complex numbers must suffer this dilemma in representation, or can you devise a representation where all 3 addition, multiplication, roots are smooth and single valued?
Since their example scales the water consumption with their electricity consumption one may conclude it was the fresh water consumed (evaporated) during production of the electricity. Gaseous H2O is an even more potent GHG than CO2.
At the frontier of science we have speculations, which until proper measurements become possible, are unknown to be true or false (or even unknown to be equivalent with other speculations etc. regardless of their being true or false, or truer or falser). Once settled we may call earlier but wrong speculations as "reasonable wrong guesses". In science it is important that these guesses or suspicions are communicated as it drives the design of future experiments.
I argue that more important that "eliminating hallucinations" is tracing the reason it is or was believed by some.
With source-aware training we can ask an LLM to give answers to a question (which may contradict each other), but to provide the training-source(s) justifying emission of each answer, instead of bluff it could emit multiple interpretations and go like:
> answer A: according to school of thought A the answer is that ... examples of authors and places in my training set are: author+title a1, a2, a3, ...
> answer B: according to author B: the answer to this question is ... which can be seen in articles b1, b2
> answer ...: ...
> answer F: although I can't find a single document explaining this, when I collate the observation x in x1, x2, x3; observation y in y1,y2, ... , observation z in z1, z2, ... then I conclude the following: ...
so it is clear which statements are sourced where, and which deductions are proper to the LLM.
Obviously few to none of the high profile LLM providers will do this any time soon, because when jurisdictions learn this is possible they will demand all models to be trained source-aware, so that they can remunerate the authors in their jurisdiction (and levy taxes on their income). What fraction of the income will then go to authors and what fraction to the LLM providers? If any jurisdiction would be first to enforce this, it would probably be the EU, but they don't do it yet. If models are trained in a different jurisdiction than the one levying taxes the academic in-group citation game will be extended to LLMs: a US LLM will have incentive to only cite US sources when multiple are available, and a EU trained LLM will prefer to selectively cite european sources, etc.
In addition to providing training sources, it's important to identify overlaps among the fragments used in the answer. For me, overlap doesn't mean simply identical expression, but conceptually identical.
We are much more likely to find conceptual overlap in code than in language and prose because Many of the problems we solve, as mathematicians say, reduce to previously solved problems, which IMO means substantially identical code.
A related question is how much change is necessary to a work of art, image, prose, or code for it to escape copyright? If we can characterize it and the LLM generates something that escapes copyright, I suggest the output should be excluded from future copyright or patent claims.
I wasn't aware of source-aware training, so thank you for the reference! It does seem a bit too good to be true; I believe in a system of tradeoffs so I feel like this must have an issue with reducing creativity. That's at first glance though, so I could be wrong.
It should also be generalize to when (for example a specific corner during dusk or dawn), and for insurers what would also be an important factor would be what other cars are nearby at the hard-braking event, it's not exactly productive to flag the chicken in chicken-or-dare scenario's.
I have the impression the implied conclusion is that under the situation described it would be better to consult different LLM models, than a specific one, but that is not what they demonstrate:
to demonstrate this you measure the compute / cost of running and human-verifying the output.
the statistics provided don't at all exclude the possibility that instead of giving the top 5 models each a single opportunity to propose a solution, it may be more efficient to give the 5 opportunities to solve the problem to the best scoring model:
at 24% win rate the null hypothesis (what a usual researcher ought to predict based on common sense) would be that the probability of a loss is 76%, and the probability that it loses N times is (0.76 ^ N), and so the probability of it winning in N attempts is ( 1 - (0.76 ^ N ) ).
So consulting the best scoring model twice (2 x top-1) I would expect: 42.24% better than the giving the 2 top scoring models each a single try ( 1 x top-2 ) as that resulted in 35%
Same for 3x top-1 vs 1x top-3: 56.10% vs 51%
Same for 4x top-1 vs 1x top-4: 66.63% vs 66%
Same for 5x top-1 vs 1x top-5: 74.64% vs 73%
Same for 6x top-1 vs 1x top-6: 80.73% vs 83%
Same for 7x top-1 vs 1x top-7: 85.35% vs 90%
Same for 8x top-1 vs 1x top-8: 88.87% vs 95%
I can't read the numerical error bars on the top-1 model win rate, we could calculate a likelihood from to see if the deviation is statistically significant.
This post measures `1x top-N` (one attempt each from N models), not `Nx top-1` (N attempts from the best-scoring model). We should make that more clear.
Part of why we chose `1x top-N` is that we expect lower error correlation compared to `Nx top-1`, which is also why the iid baseline is likely optimistic.
That said, a direct comparison (`Nx top-1` vs `1x top-N`, with the same review/compute budget) would be useful!
All of humanity has been a witness to these observations and yet we blindly assume blue light filters must have such and such an effect.
But even if it did: suppose a modern concrete-cave-dweller has an out of phase shifted day/night pattern with respect to solar rhythm, having blue light as the last form of light actually seems more natural!
reply