Era COMPUTATRUM: MAR.(2nd Half) 2026

MARCH 16, 2026

By the way, there are reports that Intel may announce something at NVIDIA GTC 2026. It appears that the announcement could involve Intel CPUs being used to control NVIDIA GPUs. Intel itself has already received a $5 billion investment from NVIDIA, and it had also announced adoption of NVLink Fusion technology, so it would not be surprising if direct connectivity becomes possible. According to reports, Intel’s earnings indicated strong performance for its custom Xeon business, and there had been speculation that NVLink support would be implemented not in general-purpose Xeon products, but rather in custom Xeon variants. NVIDIA has so far developed Grace and Vera based on the Arm architecture, but perhaps this suggests that while maintaining its Arm strategy, NVIDIA also recognizes x86 as one of the available options. As for NVLink Fusion, AWS’s Graviton and Fujitsu’s MONAKA series had reportedly been mentioned as Arm CPU participants, while SiFive had emerged as a RISC-V CPU participant.

Perhaps because of the buildup toward GTC, there seems to be relatively little other news today.

A few AMD-related topics together. AMD and Celestica reportedly plan to collaborate on production of Helios racks. In addition, AMD CEO Lisa Su recently visited South Korea, and there are reports suggesting that Samsung may supply HBM4 memory for the MI450X.
Next, semiconductor-related topics. There are reports that TSMC has begun environmental assessment procedures for a factory site intended for its 1.2 nm process generation. The company is apparently conducting studies on ecological impact. The 1.2 nm generation is expected to use nanosheet-type transistors similar to those of the 1.4 nm generation. Semiconductor processes and their surrounding technologies often involve large quantities of chemicals, heavy metals, and sometimes even lanthanide-related radioactive materials. Evaluating safety not only for humans but also for the environment (ESH: Environment, Safety, and Health) is included as part of semiconductor process development roadmaps.
There was also an interesting article regarding EUV lithography. EUV exposure systems used in semiconductor lithography are supplied almost exclusively by ASML in the Netherlands, and annual production volume is reportedly around 70 units. Compared with the past, production capacity has certainly increased, and it will likely continue to rise gradually. However, according to the report, reaching a pace of 100 systems per year by 2030 may be difficult. In current leading-edge semiconductor manufacturing, it seems clear that lithography tools are one of the major throughput bottlenecks. Many fabs and semiconductor manufacturers likely want to increase the number of EUV tools in order to expand production volume, but things do not appear to progress as easily as hoped. The article included an analysis suggesting that installing NVIDIA Rubin GPUs into a 1-GW AI data center would require approximately two million EUV exposures, corresponding to the equivalent workload of about 3.5 EUV systems.

The assumptions in the article were not entirely clear, but after thinking about it a bit, the result may make sense. Current mainstream EUV systems have a numerical aperture (NA) of around 0.3, and it is said that roughly four exposures are required to imprint circuit patterns onto a wafer. This likely causes wafers to stall at the EUV stage. In order to match throughput with the surrounding process steps before and after EUV, four EUV systems would need to operate in parallel, which roughly aligns with the conclusion that about 3.5 systems are required. However, this would also imply the need for four separate mask sets. For chips with sufficiently large production volume, preparing four mask sets may be feasible, but that would likely be impractical for low-volume manufacturing. EUV equipment manufacturers are therefore working on technologies such as High-NA systems with NA = 0.5 and improvements to 1000 W-class light sources in order to reduce the number of required exposures. Gigawatt-class AI data centers are expected to emerge across the world, and one can only hope that EUV systems will continue to be supplied smoothly enough to keep semiconductor manufacturing throughput increasing.

Finally, a cloud-related topic. AWS’s storage service, Simple Storage Service (S3), has apparently reached its 20th anniversary. The service reportedly began with just 1 PB of storage and has now expanded to hundreds of exabytes, involving tens of millions of hard disk drives (HDDs). According to one report, if all of those HDDs were stacked vertically, the stack would be tall enough to travel to the International Space Station — orbiting roughly 400 km above Earth — and back again. Dividing the 800 km round trip by the thickness of a 1-inch HDD (25.4 mm) gives roughly 31.5 million drives. That scale is difficult to imagine. Then again, perhaps the point is that even without launching a data center into space, simply stacking enough HDDs would already reach orbit.

MARCH 17, 2026

So, NVIDIA GTC 2026 has begun. The event runs for four days starting on March 16 (local time). As usual, the first day featured the keynote presentation by Jensen Huang. The keynote took place from 11:00 AM to 1:00 PM in San Jose, which corresponded to 3:00–5:00 AM on March 17 in Japan. I went to bed early the night before, and before connecting to work in the morning, I searched YouTube and found that the keynote had already been uploaded. I think it was shortly after 6:00 AM.

On the semiconductor product side, new announcements included the Feynman GPU and the Groq 3 LPU. As for the new CPU, only the name “Rosa” was shown. Groq 3 appeared in the form of a package image resembling a chip photograph, though I suspect it was probably CG. Feynman may also have been CG rather than an actual die photo. NVIDIA generally does not release real die photographs. I recall that even at academic conferences, CG images resembling GPU die photos were once presented and caused a bit of controversy. Feynman GPU and Rosa CPU appear to form the next-generation pairing. Groq 3 will reportedly join the system lineup starting from the Rubin generation.

A roadmap advancing in two-year intervals was presented, continuing from 2024. The structure appears to be as follows. At CES 2026 in January, the 2026 lineup consisted of six combinations, but with the addition of the Groq LPU, the total has now increased to seven.

◆2024
GPU: Blackwell / Blackwell Ultra (2025)
CPU: Grace
DPU: BlueField3
Network: NVLINK5
Ethernet Switch: Spectrum-5
InfiniBand: CX8
Rack System: Oberon NVL72

◆2026
GPU: Rubin / Rubin Ultra (2027)
CPU: Vera
LPU: LP30/35
DPU: BlueField4
Network: NVLINK6 / NVLINK7
Ethernet Switch: Spectrum-6
InfiniBand: CX9
Rack System: Oberon NVL72 / Oberon NVL576 / Kyber NVL144

◆2028
GPU: Feynman (TSMC 1.6 nm / Die Stacking / Custom HBM)
CPU: Rosa
LPU: LP40
DPU: BlueField5
Network: NVLINK8
Ethernet Switch: Spectrum-7
InfiniBand: CX10
Rack System: Oberon NVL72 / Kyber NVL144 / Kyber NVL1152

During the product demonstrations, NVIDIA introduced the individual trays installed into the racks, along with the large NVLINK module forming the backbone of the Kyber rack. Jensen Huang tried lifting one as a demonstration, but it seemed well beyond the weight that a person could realistically handle manually. Some type of auxiliary equipment, perhaps similar to a forklift, would likely be required.

Regarding the introduction of silicon photonics, NVIDIA emphasized that while it is promoting optical technologies, they will coexist alongside copper interconnects rather than completely replacing them. The message seemed to be that the industry will neither remain entirely with copper nor transition fully to optics, but instead use each where most appropriate.

The overall keynote began by noting that CUDA has now reached its 20th anniversary. NVIDIA celebrated the evolution of GPUs, though the presentation only went as far as the Blackwell-based RTX 5090; no footage of what would presumably become a Rubin-based RTX 6090 was shown.

As part of the discussion around expansion of the AI ecosystem, IBM’s WatsonX was included. That was somewhat surprising, but even more surprising was that IBM used a photograph of the SYSTEM/360 in its presentation. Slightly off topic, SYSTEM/360 was the original mainframe family announced in 1964 and is often regarded as the first computer system to fully introduce the concept of a “family” of compatible products. It was considered remarkable at the time that everything from small office machines to large computer-room systems could run the same operating system. The “360” represented support in all directions — 360 degrees. I never expected to see a SYSTEM/360 photo appear at GTC.

For AI system evaluation, NVIDIA showed improvements in token cost moving from Hopper to Grace Blackwell, and this time also including Vera Rubin and Groq. Jensen Huang proudly highlighted NVIDIA’s leadership position using Inference-X, the web-based AI performance comparison tool provided by SemiAnalysis. I would like to rewatch that portion of the presentation more carefully to better understand it. NVIDIA also expressed strong ambition to achieve $1 trillion in annual revenue next fiscal year, compared with roughly $500 billion this year.

Toward the end of the keynote, the focus shifted toward visions of space and physical AI, including robots. This time, the small robot character introduced was Olaf from Frozen. As a side note, whenever I see Olaf, I always think he resembles the Japanese entertainer Shingo Yanagisawa. I even watched the movie in theaters, and after seeing the robot’s form during the keynote, I still felt the same way.

The keynote concluded with a CG animated sequence in which robots gathered around a campfire in a forest, playing guitars and musical instruments while performing a blues-style musical recap of the keynote presentation. In one brief scene during the song, the familiar “five-layer cake” appeared momentarily. The lyrics, composition, and singing voice were probably AI-generated as well. If humans had created the entire thing manually, that would honestly be more surprising.

A few additional GTC-related articles also appeared. Yesterday I wrote that Intel Xeon CPUs might be adopted, and it now appears that the host CPU for the DGX Rubin NVL8 system is indeed the Xeon 6776P. Since it uses P-cores, this would correspond to Granite Rapids (Intel 3 process, 64 cores / 128 threads). Incidentally, purchasing the Xeon 6776P separately reportedly costs $9,875 per unit, assuming quantities of 1,000 units.
There are also reports that the Groq 3 LPU will be manufactured not by TSMC, but by Samsung. Groq 2 was reportedly produced using Samsung’s 4 nm process, and it appears that Groq 3 will continue using Samsung manufacturing as well. The process technology is likewise believed to remain at 4 nm.

MARCH 18, 2026

Incidentally, reports suggest that exports of NVIDIA H200 GPUs to China may be resuming. Since the beginning of this year, there have been several signs that shipments might restart, but in reality exports had still not resumed, and earlier this month articles appeared stating that production of the H200 had been halted due to growing inventory levels. Even amid GTC, reports now say that Jensen Huang commented that NVIDIA has received purchase orders and will therefore resume both shipments and production.

Continuing with GTC-related topics. Numerous eye-catching articles have appeared following the keynote and announcements from individual companies.

There are also reports concerning NVIDIA’s ambitions in space, which were mentioned during the keynote. It is not entirely clear whether the concept is a data center or a space station, but NVIDIA reportedly plans to develop a chipset called Space-1 Vera Rubin, intended to enable autonomous orbital operation for what might be described as a “data-center-class space station” (although such a category probably does not officially exist). Several companies partnering with NVIDIA are apparently already involved in areas such as data center construction, communications, image transmission, and application development.
Two GTC-related stories also emerged in the pharmaceutical and beauty sectors. Swiss pharmaceutical giant Roche is reportedly planning to purchase 2,176 NVIDIA GPUs. The exact GPU model has not been disclosed, but together with previously purchased units the company’s total deployment would reportedly reach 3,500 GPUs, said to be the largest installation in the pharmaceutical industry. Assuming 72 GPUs per rack, this would correspond to just under 50 racks.Another example comes from the cosmetics industry: L’Oréal is reportedly adopting the NVIDIA Alchemi framework. The company plans to strengthen predictive AI science while performing atomic-level analysis of how skincare ingredients function. The expectation is that this will accelerate the process from research and development to commercial products.
Another GTC-related topic concerns DLSS 5. During the keynote, NVIDIA demonstrated the effects of DLSS 5 by showing game scenes switching from DLSS 5 OFF to ON. I should note in advance that I do not personally play games on PC, but when I watched the demonstration, my impression was simply that the ON setting increased realism. However, criticism has apparently erupted on X (formerly Twitter), with many arguing that the ON version was actually undesirable. It seems that some game creators viewed it as an imposition of visual expression — or perhaps as something touching the uncanny valley.

In response to the backlash, Jensen Huang reportedly commented that creators are free to adjust DLSS 5 controls however they wish, allowing for any style of expression. NVIDIA’s intention appears to have been to emphasize that rendering is moving toward neural rendering — shifting away from the traditional model in which hardware alone improves rendering performance, toward one in which AI increasingly enhances rendering quality.

AMD CEO Lisa Su has reportedly visited South Korea and signed a memorandum of understanding with Samsung regarding memory supply. AMD appears to have secured supplies of HBM4 memory for its next-generation MI455X GPUs, as well as DDR5 memory for Zen 6 generation EPYC processors (Venice). Furthermore, the discussions may extend beyond memory into wafer manufacturing for AMD’s future products.

As demand for AI semiconductors continues rising, TSMC is accelerating fab construction, but existing 3 nm and 2 nm production lines reportedly remain fully booked through 2027. This raises the question of where newly designed chips can actually be manufactured. Outside TSMC, only Intel and Samsung currently appear capable of producing leading-edge EUV-based processes. However, it is somewhat difficult to imagine AMD outsourcing production to Intel.

Intel’s factory in Malaysia is reportedly scheduled to begin operations during 2026. Under the codename Project Pelican, preparations are underway for an advanced packaging facility. The site is expected to handle die sorting as well as LSI packaging compatible with EMIB and Foveros technologies.

In addition, Amkor’s Songdo K5 facility in South Korea is reportedly providing EMIB-compatible OSAT (Outsourced Assembly and Test) services for Intel. This suggests that Intel is steadily building a system in which it can provide packaging services even for chips manufactured by foundries other than Intel itself.

Finally, a quantum computing topic. Quantum Machines has reportedly announced an Open Acceleration Stack for CPU-GPU-QPU integration in collaboration with NVIDIA, AMD, and Riverlane. By extending the company’s orchestration platform and using NVIDIA’s NVQLINK, the system reportedly enables low-latency connections between pulse processing units (PPUs) and accelerators such as CPUs, GPUs, FPGAs, and ASICs.

This is expected to allow quantum computers to make native use of both quantum error correction (QEC) and AI. Furthermore, by enabling open connectivity with quantum computing systems, the environment for evaluating the effectiveness of new CPU, GPU, and accelerator designs may have advanced another step forward.

MARCH 19, 2026

So, GTC-related topics continue in abundance. In addition to the keynote itself, there was an article reporting further comments from Jensen Huang. According to the report, NVIDIA expects its workforce to grow to 75,000 employees over the next decade, while collaborating with 7.5 million AI agents. In other words, each employee would effectively work alongside around 100 AI agents, with AI agents becoming the primary operational workforce of the company. Even setting aside the specific numbers, it feels possible that the transition toward AI-agent-centered operations may happen in less than ten years.
He also reportedly commented that the pace of executing business operations will continue accelerating, and that people will constantly find themselves working on the critical path. That certainly feels plausible. There is an old saying that urgent work should be entrusted to busy people, and perhaps that kind of working environment will become the norm. Health management and well-being are likely to become increasingly important.

Continuing with the AI topic, NVIDIA announced NemoClaw as its initiative related to OpenClaw. I believe this was also one of the major themes of the keynote. NemoClaw appears to integrate NVIDIA’s Nemotron into OpenShell. Nemotron is NVIDIA’s LLM family, while NVIDIA NeMo provides the framework for handling Nemotron models. OpenShell serves as the runtime environment for inference agents.

By integrating these components, AI agents operating in on-premises local environments can securely access the cloud through a privacy router. Since the system also performs reinforcement learning, it is described as a self-evolving autonomous agent platform. There also appears to be a guardrail layer that isolates and protects files, networks, processes, and inference tasks so they do not become intermixed with those of other AI agents. It seems intended more as an enterprise package than as a consumer-oriented product.

Now moving to hardware topics. First, the subject of racks. NVIDIA’s Vera Rubin rack system, Kyber, appears to be moving toward support for 800VDC power delivery. Since this is DC, it means direct-current power distribution.

Conventional power systems typically supply 415VAC, convert it along the way into 54VDC, then convert it again into 12V on GPU cards and similar boards, before finally stepping down to voltages such as 0.8V near the point-of-load (POL) chips themselves. Besides conversion losses, the architecture is beginning to encounter more fundamental limitations.

When supplying the same amount of power (W), lower voltage (V) means higher current (I). As current increases, IR drop across the resistance (R) of wiring and connectors becomes more severe, increasing losses in the form of heat generation. Problems such as burning 12V connectors have gradually become more serious in practice.

To address this issue, current must be reduced, which implies increasing voltage. However, chips themselves operate below 1V, requiring voltage converters capable of handling the large voltage gap. In general, larger input voltages lead to physically larger converters, making placement and implementation difficult.

GaN-based power semiconductors have high voltage tolerance and can remain compact even at high input voltages, enabling board-mounted converters capable of accepting 800VDC input and producing 12V output. This also allows 800VDC distribution throughout the entire power system all the way to the computing trays, reducing losses across both racks and power delivery infrastructure. Such technology is increasingly viewed as essential for rack systems supporting gigawatt-class AI factories. We are entering an era in which electrical power effectively determines the scale — and thus the AI capability — of AI factories.

Now onto the Vera CPU. Reports indicate that NVIDIA has disclosed details of the Vera CPU architecture. It uses a chiplet design in which four memory-controller dies — two on each side — flank the CPU die. Each memory-controller die contains two LPDDR5 controllers, resulting in a total of eight LPDDR5 controllers per chip.

On the system board, four plus four SOCAMM2 modules are arranged around the Vera package. Assuming each SOCAMM provides 192 GB, eight modules would total roughly 1.5 TB per CPU. A dual-socket node would therefore provide around 3 TB of memory.

Below the CPU die sits a PCIe Gen6 x16 PCIe die. Above the CPU die there are no additional chiplets, but rather the NVLINK C2C (Chip-to-Chip) macro integrated into the CPU die itself. This connects not only to GPUs, but also links Vera CPUs together in dual-socket configurations. The NVLINK portion occupies significantly less area than the PCIe die. Differences in semiconductor process technology may partly explain this — Vera is reportedly built on TSMC 3 nm — but the PCIe die still appears roughly five times larger.

Inside the CPU die itself, cores are arranged in a 7-column by 13-row configuration. That calculates to 91 cores, although the publicly announced number is 88 cores. The difference of three cores appears to represent redundant backup cores intended to compensate for defects. “Nonfunctional” can mean several things: cores that fail post-manufacturing testing entirely, or cores that function but fail frequency validation due to insufficient operating speed.

If 88 out of 91 cores pass validation, that corresponds to a yield requirement of roughly 96.7%, implying process variation management at approximately the one-sided 2σ level (97.7%). Of course, the actual situation is unknown, but perhaps the design target itself remains at 3σ while manufacturing retains margin by relaxing operational management to around 2σ. If both design and manufacturing were managed entirely at 3σ, one might expect 90 cores rather than 88. Even so, shipping chips with three intentionally disabled cores out of 91 feels somewhat luxurious. Incidentally, Grace reportedly shipped with 72 active cores out of 76 physical cores, so perhaps Vera is slightly more aggressive in this regard.

As for the CPU core architecture itself, the previous-generation Grace used Arm Neoverse V2 cores, whereas Vera introduces NVIDIA’s custom-designed Olympus core. The inter-core mesh network has evolved into the second generation of the SCF (Scalable Coherency Fabric), although it still appears to be based on Arm’s CMN (Coherent Mesh Network), much like Grace.

Generally speaking, Arm CPUs in data center environments tend to favor single-thread designs where the number of logical cores equals the number of physical cores. Vera, however, supports two threads per core, yielding 176 logical cores in total. Multi-threading within a single physical core typically involves time-sharing the same execution resources, which can introduce noisy-neighbor issues and unintended security vulnerabilities. For this reason, single-thread architectures have tended to dominate in data center applications. Vera’s dual-thread structure, however, appears designed specifically to address such concerns.

NVIDIA describes Vera’s approach as Spatial Multi-threading. Although the threads exist within the same physical core, resources are spatially partitioned and separately allocated. I do not fully understand the implementation, but perhaps it effectively duplicates execution resources.

Grace’s mesh configuration appeared to use a 7×6 layout, while Vera seems closer to 7×7, suggesting that the number of physically connectable cores has not changed dramatically. However, by advancing from the N4 process used in Grace to the N3 process used in Vera, NVIDIA may have redirected the increased transistor budget toward higher logical-core counts. N4’s logic density is considered roughly equivalent to N5, while N3 is often described as offering around 1.7× the logic density of N5. Therefore, moving from N4 to N3 could plausibly allow about 1.7× more logic circuitry. Perhaps L2 cache remains shared within each physical core while execution resources themselves are duplicated.

Returning briefly to the topic of redundancy: if NVIDIA eventually activates up to 90 cores, the system would become a 180-logical-core CPU, meaning the dual-thread architecture could effectively add four more logical cores. Perhaps this is something planned for later once yields stabilize.

One small curiosity concerns royalty structures. Depending on the licensing arrangement, royalties may be charged per chip or per core. I found myself wondering whether NVIDIA pays licensing fees based on 88 physical cores or 176 logical cores. However, since Vera currently appears to have no core-count variations, perhaps licensing is simply handled at the chip level. That concludes the Vera discussion.

Returning now to broader GTC-related topics. Cisco has announced a Secure AI Factory concept at GTC. AI factories contain enormous numbers of GPUs and CPUs, increasingly combined with optical interconnects and a variety of ultra-high-speed communication technologies. Security therefore also needs to evolve accordingly.

One component appears to involve extending Cisco Hybrid Mesh Firewall capabilities into NVIDIA BlueField DPUs. The idea is that threats embedded within external requests can be intercepted and blocked at the DPU layer before reaching CPUs or GPUs. Since workloads inside AI factories involve dramatically larger numbers of processes requiring inspection than before, strengthening security around BlueField DPUs is expected to create a more robust security architecture.
Meanwhile, Alice & Bob reported at GTC on results integrating NVIDIA CUDA-Q into quantum computing workflows. Last year the company integrated CUDA-Q into its QPU simulation library Dynamiqs, and this time they reportedly used GH200 systems for quantum error correction (QEC) decoding simulations.

Compared with an AMD Ryzen 9 9950X (16 cores / 32 threads), execution time reportedly fell from 18 hours and 2 minutes to 1 hour and 57 minutes — approximately a ninefold improvement. It appears that after quantum calculations are disrupted by decoherence, the decoded results produced through QEC simulations can be fed back into ongoing quantum computations.

I honestly have little intuition for how such reintegration works internally. However, if decoding takes 18 hours, the workflow would effectively stretch across an entire day. Reducing that to around two hours would likely make a tremendous practical difference.

Finally, a topic unrelated to GTC. Reports indicate that AMD CEO Lisa Su visited South Korea’s Naver. AMD and Naver apparently signed a memorandum regarding the use of AMD chips in Naver’s AI data centers. AMD is expected to provide GPUs optimized for HyperCLOVA X, Naver’s LLM.

Although NVIDIA Blackwell systems are currently deployed there, Naver reportedly wants to avoid becoming locked into NVIDIA and is therefore considering a multi-vendor strategy. In the near term, the company plans approximately 90 MW of additional capacity using AMD or Intel GPUs, bringing total capacity to around 135 MW. There are apparently also plans to eventually double that further to roughly 270 MW.

March 20, 2026

Come to think of it, there are reports that Intel is reconsidering its socket strategy. Intel CPUs have long been known for changing socket shapes every two generations, but apparently the company now plans to keep socket compatibility so users can continue upgrading CPUs without replacing their motherboards. AMD, for example, kept the Ryzen platform on AM4 through Zen 3, and moved to AM5 starting with Zen 4. Socket compatibility allows users to continue using the same motherboard and memory setup, which is clearly more convenient. Memory prices have surged since last year, and as a side effect, it appears unlikely that many new consumer CPUs or GPUs will launch this year. Overall, it is a difficult environment for DIY PC builders, so promising socket compatibility seems like a positive move. It is probably beneficial for motherboard vendors as well under current conditions.

This goes back quite a long time, but Intel once had an unwritten rule: when the CPU architecture changed, the socket changed as well, while transitions to a new semiconductor process using the same CPU architecture would keep the same socket. This development policy was known as the Tik-Tok strategy. From the user’s perspective, CPUs were upgraded every year, while internally the architecture team and semiconductor process team alternated major updates every two years.

The two-year semiconductor process cycle also aligned neatly with Moore’s Law. Every 24 months transistor density would double, allowing more cores and cache memory at the same power consumption while increasing clock frequencies. The Tik-Tok strategy lasted from 2006, when Intel moved the NetBurst architecture to the 65nm process, through 2015, when Skylake arrived on the 14nm process. After that, Intel struggled to transition to 10nm, and I believe the company officially declared the end of Tik-Tok.

Whenever CPU architectures changed, the surrounding system also tended to evolve: memory and chipsets were upgraded, CPU pin counts increased, package dimensions changed, and sockets had to be replaced. At the time, DIMMs moved from DDR3 to DDR4, PCIe and USB generations advanced, ATA gave way to SATA, and one new technology after another appeared, giving users clear reasons to buy new motherboards. Back then Intel’s IDF (Intel Developers Forum) was extremely lively, and it felt like Intel was also driving motherboard form-factor development.

Another Intel-related topic: there are reports that Intel may raise CPU prices by around 10%. Growing demand for AI inference systems appears to be increasing workloads that depend heavily on CPUs, pushing CPU demand upward.

Continuing with CPUs, there is news related to Arm. Arm itself is apparently preparing to announce an Arm CPU. According to reports, the announcement will take place at next week’s Arm Everywhere event. Arm is traditionally an architecture licensing company, but I recall the company announcing last year that it would develop its own Arm CPU design. Perhaps this is finally becoming reality. Details remain unclear, however. Since Arm architectures are used across smartphones, PCs, data centers, HPC systems, and embedded devices, it is difficult to predict the target market. Moreover, major players already exist in every segment, so I wonder whether Arm can really gain meaningful market share. I am also curious about which semiconductor process node will be used. Foundry capacity already seems fully booked, so perhaps they reserved production long ago. (Though perhaps that is none of my business.)
There are reports that SK hynix will use TSMC’s 3nm process for the logic die in its HBM4E memory. SK hynix’s HBM4 is already used in NVIDIA GPUs, but it is apparently considered inferior to Samsung’s implementation in performance. The company seems determined to reverse that perception with HBM4E. Current HBM4 reportedly uses a 1b DRAM process (fifth-generation 10nm-class) for the memory core chips and TSMC’s 12nm process for the logic die handling PHY and I/O functions. For HBM4E, however, SK hynix plans to move to a 1c DRAM process (sixth-generation 10nm-class) alongside TSMC’s 3nm logic die process. Samsung, meanwhile, is believed to be pursuing a combination of 1c DRAM with its own 4nm process.
Two more items involving TSMC and Samsung. First, pricing. There are reports that TSMC may raise wafer prices for processes below 5nm due to tight manufacturing capacity. That would include 5/4nm, 3nm, and 2nm nodes. I seem to recall reports from last year into this year suggesting that wafer prices for 3nm and beyond would rise by roughly 10% annually through 2029. Perhaps now 5/4nm has been added to that trend. The old business model—where older-generation wafer prices declined once next-generation processes became mainstream—may no longer apply. Samsung, meanwhile, has reportedly also raised prices for its 5/4nm processes beginning in Q4 2025 due to growing demand.
Next, labor disputes. Taiwan’s labor inspection authorities investigated violations of labor standards laws among Taiwanese companies, and TSMC reportedly ranked seventh in the top ten offenders. The details are unclear, but common violations apparently include unpaid overtime and concealment of excessive working hours. Problems like these never seem to disappear regardless of whether the economy is strong or weak. In strong economies there is too much work; in weak economies there is too little money. Taiwan is obviously dealing with the former case. Also, according to a report from two days ago, Samsung employees are reportedly planning a general strike in May. If it truly becomes a general strike, it would presumably involve the entire Samsung Group, including semiconductor manufacturing. At the moment it seems the labor union has merely secured the legal right to strike, so negotiations will likely continue until May. This is not a technical topic, but technology is ultimately supported by people, so I think these issues are important.

March 21, 2026

Come to think of it, NVIDIA appears to have formed a partnership with AWS for GPU supply. According to reports, NVIDIA will deliver more than one million GPU and LPU AI chips by 2027. AWS already operates its in-house Graviton5 CPUs and Trainium3 AI chips for training and inference, but it seems the company decided to purchase NVIDIA chips separately as an immediately deployable solution. Trainium4 is also under development, but considering that the current procurement agreement extends through 2027, perhaps AWS expects it will still take time before its own chips achieve stable competitive performance, making NVIDIA’s products a kind of “bridge solution” in the meantime. This may also tie into the topic below, as AWS could be aiming to expand its AI ecosystem through its investment in OpenAI.
Another AWS-related topic: Microsoft is reportedly preparing to sue AWS and OpenAI. This likely relates to reports from late last month claiming that AWS planned to invest $50 billion in OpenAI. The AWS–OpenAI partnership was described as providing stateful AI services through Amazon Bedrock, while maintaining Microsoft’s existing relationship with OpenAI for traditional stateless AI services. Since the details of the lawsuit are unclear, one can only speculate, but Microsoft has invested in OpenAI since 2019, when OpenAI was still a non-profit organization, and together they built an AI ecosystem centered on Azure and OpenAI for enterprise customers. That ecosystem likely includes substantial technical expertise and business know-how. Microsoft may therefore be concerned that such confidential knowledge could indirectly flow to AWS through OpenAI. Meanwhile, AWS has already secured commitments from NVIDIA for the supply of more than one million AI chips, and together with OpenAI appears intent on expanding its AI ecosystem. Since direct litigation tends to take a long time, Microsoft—being in the position of trying to delay or restrain the partnership—may hold the strategic advantage.
There are reports that TSMC Chairman C.C. Wei has received an honorary doctorate from Asia University in Taichung. He reportedly noted that when he joined TSMC in 1998, the mainstream semiconductor process was 0.25 micrometers, whereas the company is now producing at the 2nm node—a reduction by a factor of one hundred. Of course, “2nm” today is more of a marketing designation than a literal 2nm feature width, but in terms of equivalent scaling on the semiconductor roadmap, it is reasonable to say that transistor performance has improved by roughly 100 times. In his speech, Wei also discussed concepts for caregiving robots. Beyond image recognition, he emphasized that many challenges remain, including pressure sensors that can safely interact with the human body and systems for controlling motion mechanisms. Still, I believe an era in which humans and robots coexist is unquestionably coming. It will be exciting to see physical AI develop into an industry on the scale of today’s automotive sector.
Finally, an IBM-related topic. This is not especially technical, but there are reports that Ogilvy, which handled IBM’s advertising strategy for 32 years, will not participate in the selection process for IBM’s next creative agency. When I think of IBM commercials, I remember the slightly dark blue bands at the top and bottom of the screen, framing explanatory videos, short drama-style segments, and sometimes humorous exchanges. Several commercials from the Big Data and Smarter Planet era remain particularly memorable to me. Perhaps IBM’s image and style will begin to change somewhat going forward. It may be something worth watching closely.

March 22, 2026

Speaking of which, there was an article reporting that Apple had blocked an update for an AI vibe-coding app available on the App Store. Vibe coding is a system in which AI translates rough instructions into specifications and writes code, even for people without programming knowledge. The app in question apparently could open a browser during preview mode and run applications specialized for device-specific functions. I do not fully understand the details of the issue, but it does sound like the kind of functionality that could be abused by malicious actors. These points seem to violate the App Store review guidelines. However, the report says that the app may be reinstated if the developer makes the necessary corrections. Still, it is remarkable that simply conveying a rough idea or atmosphere can now result in a functioning program. It really is an incredible era.
Next, a topic related to exports. There are reports that two employees of a U.S. server vendor and one contractor illegally exported NVIDIA GPU servers to China. They allegedly ordered large quantities of H200 and B200 GPU servers through shell companies in Southeast Asia, then repackaged the delivered servers into different boxes and shipped them to China. Two of the three individuals were arrested, while one remains at large. The arrested individuals were reportedly a co-founder of Supermicro and a contractor, while the fugitive is said to be the head of the company’s Taiwan sales division. Supermicro released a statement saying that the two employees had been terminated and that the company complies with export regulations.

Supermicro previously failed to make timely disclosures for its FY2024 earnings report (ending June 2025) and for its FY2025 first-quarter earnings report (July–September 2025). Back in 2018, the company was temporarily delisted by the U.S. Securities and Exchange Commission for failing to file financial statements. Afterward, it once replaced its CFO and attempted to rebuild the organization, but once the restructuring was complete, it changed CFOs again and rehired former executives, effectively returning to the old structure. Hindenburg Research, the activist short-selling firm that shut down in January 2025, had also published a report detailing episodes that reflected the company’s corporate culture. Overall, there seems to be a persistent concern regarding corporate governance, and the situation still feels somewhat unsettling.

Finally, a topic related to AWS. TechCrunch appears to have published a report about the laboratory developing AWS’s AI chip, Trainium. Trainium4 is currently under development, while Trainium3 was already released last December. In addition, the team is now developing the Neuron switch, which will allow Trainium3 chips arranged in a mesh topology to communicate with one another at low latency. The cumulative number of deployed Trainium chips up through Trainium3 is expected to reach 1.4 million units. Apparently, Anthropic alone is operating one million Trainium2 chips. If so, that would imply AWS itself is using around 400,000 chips. The Trainium development group originally came from Annapurna Labs, which AWS acquired in 2015, and the report notes that the Annapurna Labs logo still remains in the lab today. The article also introduced the facilities and work environment with photographs and interviews. Personally, it reminded me somewhat nostalgically of the atmosphere of engineering sites I had seen in the past.

March 23, 2026

Speaking of which, NVIDIA’s Feynman was only just unveiled at GTC, but there are already reports suggesting that manufacturing it may prove difficult from the standpoint of TSMC’s production capacity. Feynman is expected to be manufactured using TSMC’s A16 process. In terms of process nodes, A16 is positioned as an enhanced version of the 2nm generation, incorporating technologies such as BSPD (Backside Power Delivery), which TSMC refers to as “Super Power Rail.” Mass production of 2nm chips is scheduled to begin at the end of 2025, but demand is reportedly already overwhelming supply, with manufacturing reservations fully booked through 2027. TSMC plans to build additional 2nm fabs, but even so, demand beyond 2028 is expected to fill quickly. Under such circumstances, questions are already being raised about whether sufficient A16 capacity can be secured. Feynman is expected to adopt a chiplet-based architecture, and it is believed that portions other than the A16 components may use the N3P process.

CoWoS packaging is also expected to become a bottleneck, and there is speculation that Feynman may rely not only on TSMC but also on Intel Foundry. At GTC, NVIDIA stated that Feynman would use 3D stacking and custom HBM, but supply and procurement coordination will likely be essential. Generally speaking, when chiplet architectures span different semiconductor processes and different companies, delays in the supply or assembly of any single component can prevent the completion of the final product. During the era when semiconductor scaling advanced rapidly, the trend was toward increasingly multifunctional monolithic chips. Today, however, as process scaling has slowed, building high-performance XPUs inevitably requires increasingly complex chiplet-based structures with growing numbers of components. When semiconductor shortages or insufficient fab and OSAT (Outsourced Semiconductor Assembly and Test) capacity are foreseeable, simpler architectures with fewer parts and less assembly complexity would seem preferable. This issue is likely to persist for some time.

Another announcement from GTC also caught attention. NVIDIA and Emerald AI appear to have partnered to build an AI factory designed to control power supply networks. Using NVIDIA’s Vera Rubin DSX AI factory reference design together with Emerald AI’s Conductor platform, the initiative aims to manage electric utility power grids. Participating utility companies reportedly include AES (Virginia), Invenergy (Illinois), NextEra Energy (Florida), Vistra (Texas), nuclear-related company Constellation, and Nscale Energy & Power, the data center power infrastructure division of European hyperscaler Nscale. I believe Nscale had previously received investment from NVIDIA.

Power supply management fundamentally depends on continuously balancing electricity generation and consumption throughout the day. In recent years, however, energy sources have become increasingly diversified with the expansion of renewable energy and the spread of battery storage systems. At the same time, the enormous power consumption of AI data centers has itself become a growing issue. This initiative appears to represent an attempt to integrate AI data centers directly into the power grid infrastructure rather than simply treating them as large consumers of electricity.

March 24, 2026

Speaking of which, Broadcom appears to have pointed out that TSMC’s manufacturing capacity has become the primary bottleneck for the semiconductor industry. Equipment installation for TSMC’s sub-2nm fabs is scheduled to begin in Q3 2026 at FAB20 Phase 3 in Hsinchu and FAB22 Phase 3 in Kaohsiung, which should increase capacity to some extent in 2027. However, Broadcom reportedly believes that 2026 may still see significant supply chain impacts. In particular, there are observations that Apple alone may account for roughly 50% of the initial 2nm process capacity. Broadcom also noted that supply delays are especially severe in the laser sector, while PCB lead times in Taiwan and mainland China are reportedly increasing. Although there is no definitive confirmation, this may suggest that production related to silicon photonics is struggling to keep up with demand. Meanwhile, suppliers appear increasingly inclined to sign long-term agreements with customers, and the article states that Samsung’s partnerships with major clients are now structured as contracts spanning three to five years.
Another topic related to TSMC: there are reports that TSMC’s 2nm pricing is expected to rise for four consecutive years. Reservations are reportedly already filled through 2028. Even Arizona’s FAB21 appears to have fully booked production capacity, including future 3nm and 2nm facilities that have not yet been completed. The 3nm production line at FAB21 had originally been scheduled for completion in 2028, but construction is reportedly being accelerated toward the second half of 2027. Overseas TSMC fab construction projects were previously said to require around six quarters (roughly 1.5 years), but recent streamlining efforts appear to have reduced that timeframe to approximately four to five quarters. Apple, NVIDIA, AMD, and Qualcomm are all believed to be seeking domestic U.S. manufacturing capacity.
It also seems that the Middle East situation has increasingly begun affecting the semiconductor industry. Roughly 30% of the world’s helium supply is exported from Qatar, and helium is indispensable in semiconductor processes for applications such as cooling. This affects not only TSMC but Samsung as well. For Samsung in particular, around 68% of its helium supply reportedly comes from Qatar, meaning any supply disruption could become a critical issue. Intel, on the other hand, is likely using helium sourced from the United States, which is the world’s largest helium producer.
Another concern involves Taiwan’s natural gas supply. Reports suggest that Taiwan may only have around 11 days’ worth of natural gas reserves, compared with roughly 90 days in Japan and South Korea. The Strait of Hormuz was apparently closed around March 12, and articles warn that if power supply disruptions affect TSMC’s fabs, large quantities of work-in-progress wafers could be lost. For cutting-edge semiconductor technologies, there are effectively no alternatives to TSMC, making multisourcing impossible. This is not unique to Taiwan, but modern manufacturing increasingly relies on just-in-time (JIT) production with minimal inventory buffers. As a result, once chip manufacturing stalls, downstream production processes can quickly become paralyzed. If the components listed in a bill of materials are not available, products cannot be completed, and if modules cannot be finished, the next stage of production cannot proceed. Semiconductor manufacturing—and the products built upon semiconductors—exist within a framework of international cooperation.

March 25, 2026

Speaking of which, at Arm Everywhere, the event hosted by Arm, articles report that Arm announced its independently developed Arm AGI CPU. Regarding Arm entering the CPU business itself, I believe development plans were first announced in May 2024, and reports in February 2025 indicated that development was progressing. At the time, it was said the product might appear sometime in 2025 or within fiscal 2025, and it seems they managed to make it before the end of the fiscal year. It may be somewhat unusual for a company to both license architectures and IP while simultaneously selling chips of its own. x86 processors are made by Intel and AMD, but the architecture itself is not licensed out. RISC-V and SPARC are managed under open-license organizations, and I do not believe those organizations manufacture chips themselves. IBM’s Power architecture is still developed by IBM, but I believe it transitioned to an open-license model and was transferred to the Linux Foundation.

The announced chip uses a two-chiplet configuration and is described as having 136 Arm Neoverse V3 cores. The chip image is clearly CG rather than a real die photograph, but counting the visible cores gives about 120. There appear to be patterns resembling an additional 20 cores—10 on the top and bottom edges of the V3 core array—and if those are also V3 cores, the total would become 140 cores, suggesting four redundant spare cores. Furthermore, the visible 120 cores appear to be paired in groups of two. This reminded me of NVIDIA’s Vera chip image (which also appears to be CG), where the cores likewise seem arranged in pairs. The Arm AGI CPU may be using the same type of mesh network as Vera. However, if that is the case, the 20 core-like structures at the edges also appear somewhat independent from the mesh network. Some articles speculate that these may not be V3 cores (Poseidon) at all, but rather N3 cores (Hermes). In that case, it would no longer truly be “136 V3 cores,” but since the image is only a conceptual rendering, I do not want to overthink it too much.

It was unclear from the published information whether the two chiplets are directly interconnected, but judging from the substrate coloring, it is probably using CoWoS-L, similar to NVIDIA Blackwell. CoWoS-L uses an organic substrate, so its wiring pitch is coarser than silicon interposer-based CoWoS, but it incorporates Local Silicon Interconnect (LSI) bridge components beneath the chip-to-chip connections, which should allow interconnect density comparable to standard CoWoS. The opposing sides of the two chiplets appear to contain L3 cache memory, totaling 128MB. In the introduction video, the socket dimensions are listed as 78 × 73 mm, and assuming that refers to the socket size, the chip itself would likely be around 2 cm × 3 cm, probably exceeding 600 mm². A 3 cm die dimension is already approaching manufacturing limits. The chip is reportedly fabricated on TSMC’s N3P process, and the configuration appears to consist of two identical chips mirrored side by side, including DDR5 and PCIe functionality. Having fewer individual components is likely advantageous in reducing supply-chain risk.

As for server availability, AsRock is reportedly developing the motherboard, while Supermicro will handle sales. Rack-level solutions are also planned and will conform to the OCP rack standard. Each CPU operates at 300W and can be air-cooled. There will reportedly be an air-cooled rack with thirty 1U blades, each containing two CPUs, as well as a liquid-cooled rack with forty-two 1U blades containing eight CPUs each. The air-cooled rack consumes 36kW, while the liquid-cooled rack reaches 200kW. That is comparable to NVIDIA’s Vera Rubin NVL72 rack and AMD’s Helios rack in terms of power consumption.

Meta is reportedly the first customer, and development has apparently been carried out jointly since 2024. Meta recently announced its own MTIA400-series AI chips, and Arm AGI CPU will reportedly serve as their CPU platform. Other early customers are said to include OpenAI and Cerebras. One particularly striking aspect of Arm’s announcement was the presence of endorsements from 52 companies. These included foundries, ASIC design firms, CAD vendors, operating systems and platform providers, cloud service providers such as AWS—which itself develops competing Arm CPUs—and even NVIDIA, whose Vera product competes in the same space. Notably absent were rival x86 companies Intel and AMD, and likely also Apple and Qualcomm, whose market positioning differs somewhat. (Qualcomm was, of course, once involved in litigation with Arm.)

As the event name “Arm Everywhere” suggests, Arm-based CPUs and MPUs now span everything from embedded devices to supercomputers, and Arm itself has now entered the high-end CPU market directly. This is slightly different from the foundry business, but TSMC has long stated that it does not produce its own chips because doing so would compete with customers. Meanwhile, Intel Foundry—despite being a manufacturing service—has struggled to attract customers partly because Intel itself competes with them. Viewed in that context, the 52 endorsements may have been intended to demonstrate that Arm’s own chip business will not directly threaten its licensing customers. Still, manufacturing competing products while simultaneously selling architecture licenses is certainly something worth watching closely.

Changing topics slightly, there is also news regarding GPU smuggling. The Supermicro employee case involving illegal exports to China now appears to be affecting NVIDIA as well. When the U.S. White House granted export permissions for NVIDIA GPUs to China, Jensen Huang reportedly argued that GPU systems were far too large to be secretly smuggled out, and that explanation apparently became part of the basis for the approval decision. It seems the assumption was based on GB rack systems weighing around two tons. In reality, however, the illegally exported systems were GPU servers, which are much smaller than GB rack systems. From the CEO’s perspective, perhaps GPU servers simply were not considered part of the equation.
Another China-related topic: Alibaba reportedly announced a next-generation AI chip. The chip is called Xuantie C950 and is believed to use the RISC-V architecture. It is reportedly fabricated using TSMC’s 5nm process. I am not entirely sure how a Chinese company can still have AI chips manufactured at TSMC. Around 2022, I recall that TSMC was manufacturing 7nm GPUs for the Chinese fabless company Biren, but production was halted due to U.S. pressure. Then in 2024, after a Chinese company was discovered rerouting chips to Huawei, I believe TSMC notified all Chinese companies that it would no longer supply AI chips at 7nm and below.
Finally, some Apple-related news. Performance evaluation results have reportedly emerged for Apple’s C1X 5G modem, which was installed in the iPhone Air released last September. The iPhone 17 Pro Max, released around the same time, uses Qualcomm’s X80 5G modem, and comparisons between the two reportedly show nearly equivalent performance in latency and download speeds. Upload performance still appears to favor the X80 due to its support for carrier aggregation. Apple is expected to address this with the upcoming C2 modem, which many anticipate will appear in this autumn’s models. However, articles published around the middle of this month reported that Apple had extended its agreement regarding Qualcomm’s 5G RF system, leading to concerns that Apple’s in-house modem development may be falling behind schedule.

March 26, 2026

Speaking of which, articles have appeared reporting that one of Apple’s co-founders, Steve Wozniak, expressed disappointment with today’s generative AI, saying it is still insufficient. The comments were likely made during interviews connected to Apple’s 50th anniversary, in response to questions from anchors on CNN and FOX News. His criticism appears to center on the idea that generative AI can provide large amounts of information, but often in the wrong direction, lacking warmth and humanity. In the end, he reportedly concluded that for AI to truly be complete, it would need to “live as a human being.” In that sense, his comments may have been less about generative AI itself and more about expectations for the completion of “artificial intelligence” in the broader sense.

On the other hand, two or three days ago, there was also an article reporting that Jensen Huang stated during a long-form online interview that “AGI has already been achieved.” Considering that AI agents are beginning to enter society and participate in real work, one could argue that AI operating alongside humans in practical roles qualifies as “general-purpose.” In that sense, perhaps it is fair to say AGI has already been attained.

Personally, I think AI is fundamentally “a computer that pretends.” If you ask it to teach, it pretends to be a teacher. If you seek advice, it pretends to be a friend. It can pretend to be a painter or novelist. If you make an administrative request, it pretends to be a staff member. Within companies, many different job functions exist, and positions are filled according to roles. In that context, whether the applicant is human or AI may eventually cease to matter. In that sense, AI is already capable of holding jobs. However, when entirely new cases appear and new workflows must be created, humans will still make decisions involving compliance, accounting, and governance. Those management responsibilities involve accountability, so they will likely remain human roles. (Which is why, no matter how many AGIs gather together, they still would not simply become NVIDIA.) From a straightforward perspective, if AI can collaborate with humans across a wide range of tasks, then calling it AGI may not be much of an exaggeration. I imagine this is roughly what Jensen Huang meant.

But what about pretending to be a friend? Humans also pretend to be friends sometimes, so perhaps that itself is not unusual. (Though the meaning may differ somewhat in this context.) In the future, there may even be AI systems that pretend to be parents. That raises more difficult questions. Attempting to replace human relationships with AI seems likely to create ethical concerns.

To shift toward a slightly lighter topic, if we think of AI simply as a conversational companion, then I believe AI would need emotional dynamics similar to those of humans. That would mean AI would also require individuality. Humans develop personalities through the environments in which they grow up, while AI might instead develop personalities tailored to the people they interact with. However, if that leads to echo chambers or filter bubbles, it could become harmful to humans, which means AI may need some kind of standard philosophical framework. Interestingly, generative AI itself appears to be increasing public interest in philosophy. (Humans are creatures that engage in self-questioning.) If memories accumulate between a person and a conversational AI, then when the AI is reset and its personality disappears, people may eventually describe it by saying “the AI died.” At that point, one could argue that the AI truly had “lived as a human.” I suspect this may be closer to what Wozniak was trying to express.

The AI factories realized by NVIDIA may have created AI that collaborates with humans. Meanwhile, the kind of AI Apple aspires toward may ultimately be AI that can “die.”

Next, some discussion about the Terafab concept. Over the past few days, numerous articles have appeared regarding Elon Musk’s Terafab vision. At first I was unsure whether it was realistic, but considering the broader context, it may actually have a certain plausibility. The proposed Terafab complex in Austin reportedly consists of two semiconductor fabs: one for AI semiconductors used in autonomous driving and humanoid robots, and another dedicated to semiconductors for space-based AI data centers. The space-oriented semiconductors would apparently be specialized devices developed specifically for that purpose. For reference, low-earth-orbit satellites (LEOSATs) operate at altitudes of roughly 200 km to 1,500 km, inside the Van Allen radiation belt, while geostationary broadcast satellites orbit at around 36,000 km and are exposed to far greater radiation. Presumably, the proposed space AI data centers would operate in low-earth orbit.

Outer space is an extremely harsh environment for semiconductors due to dramatic temperature swings between direct sunlight and Earth’s shadow, along with radiation levels far beyond those on Earth. Since Elon Musk already operates Starlink satellites around 550 km altitude, he may possess extensive radiation-effect data. When radiation strikes a transistor, temporary malfunctions can occur. Since these errors disappear after rebooting, they are known as soft errors. However, rebooting is not always possible during operation, so aerospace systems often require redundancy through dual or triple modular configurations.

Musk’s argument for needing Terafab seems rooted in the limits of global semiconductor supply capacity. He reportedly believes that output from fabs such as TSMC, Samsung, and Micron is approaching its limits, making it necessary to build fabs independently. He estimates that the world currently produces about 20GW worth of semiconductor compute annually, whereas Terafab would initially aim for 100–200GW per year and eventually target 1TW of compute production. About 80% of that (800GW) would reportedly be dedicated to space systems, with the remaining 20% used for terrestrial autonomous driving and humanoid robotics. It appears to assume that all data centers themselves would ultimately reside in space.

According to the reports, analysts estimate that producing 1TW of compute capacity would require manufacturing between 7 million and 18 million wafers per month. Assuming a fab capacity of 50,000 wafers per month, that would imply constructing somewhere between 140 and 360 new semiconductor fabs. Naturally, packaging facilities would also be required, along with associated memory and supporting semiconductor components. My impression is that Musk does not simply believe current fabs have reached their physical limits, but rather that projected semiconductor demand will continue to outpace the expansion plans already announced by chipmakers, resulting in what could become an “eternal waiting line.” Faced with that possibility, Musk may have concluded that building fabs himself is the only solution. In such a situation, the earlier one starts, the smaller the eventual losses.

If computational capacity continues to remain insufficient, semiconductor prices—and the products that rely on them—will inevitably become more expensive. The term “digital divide” emerged around 2003, and since then society has experienced the rise of the internet, IoT, smartphones, autonomous driving, massive data centers, and now AI. Underlying all of it has always been semiconductors, supported by many semiconductor companies around the world. Today, AI and data-center sovereignty have already become national strategic concerns. If semiconductor prices rise dramatically at this point, I worry that a second digital divide may emerge at the national level. The gap between countries that possess AI, HPC, and large-scale data centers and those that do not may widen even further.

Regarding the Terafab vision, Musk reportedly described it as the dawn of a galactic civilization. Listening to his remarks at the World Economic Forum Annual Meeting in late January, however, I had the impression that he views the solar system itself as a single integrated energy system. Faced with humanity’s growing demand for enormous energy and computational resources, perhaps he sees our current methods—digging up oil and burning it, or boiling water with nuclear power to generate electricity—as fundamentally unnatural. Nature and the universe tend toward lower-cost, lower-potential equilibrium states. If computation is to exist within that flow, perhaps outer space is the optimal location. Aside from natural disasters, placing data centers in space would also reduce the risk of them being targeted in war. Some people argue that the Terafab concept is merely about boosting stock prices or attracting subsidies, but personally I find it exciting and hope it succeeds.

Finally, some White House-related news. Leading figures from U.S. AI companies have reportedly been invited to join PCAST, the President’s Council of Advisors on Science and Technology. PCAST is an advisory body that provides the President of the United States with recommendations on science and technology policy issues and solutions. The organization itself has existed for quite a long time, though its modern structure apparently dates back to the administration of George W. Bush in 2001. I first became aware of it during the administration of Barack Obama, when meetings were held roughly every other month and related documents and videos were publicly released. The discussions cover science and technology broadly—not only IT, but also education, healthcare, urban life, and space development. For pure natural science, the NSF plays the larger role, but engineering tends to intertwine with social issues, which makes PCAST especially interesting.

During the first administration of Donald Trump, PCAST itself seemed to fade into the background, almost as if it had been pushed into a corner of the Department of Energy. It was later revived under the Biden administration through an executive order, and under the second Trump administration it apparently continues under Executive Order 14177, though I am not fully aware of its current activities. The recent appointments reportedly include Jensen Huang of NVIDIA, Larry Ellison of Oracle, Sergey Brin of Google, Mark Zuckerberg of Meta, and Lisa Su of AMD. I believe Lisa Su also served on PCAST during the Biden administration.

I have not followed PCAST very closely in recent years, but several new documents seem to have appeared, so I would like to start paying attention to it again.

March 27, 2026

Come to think of it, remarks from former Intel CEO Pat Gelsinger have surfaced. He apparently said that investments and returns in the semiconductor industry are inherently based on long-term planning, meaning short-term profits cannot realistically be expected. However, executives end up failing because they are pressured by Wall Street investors demanding short-term gains.

When Gelsinger first became CEO, Intel announced its “5N4Y” strategy, which honestly stunned me at the time. At the same time, I remember wondering whether it could really be achieved. The “5N” referred to five semiconductor process nodes, while “4Y” referred to the four years beginning in 2021. In other words, the plan was to develop five process nodes within the four-year period from 2021 to 2025.

Specifically, the original five nodes were Intel 7, Intel 4, Intel 3, Intel 20A, and Intel 18A. Normally, after Intel 7, multiplying by 0.7 would lead to something like “5,” but Intel instead advanced by a half-pitch step and named it Intel 4. Applying the same logic, Intel 4 became Intel 3, then Intel 20A (effectively “2”), and after that would ordinarily have been Intel 14A, but they appear to have pulled back by a half-pitch step and named it 18A instead. Altogether, this amounted to roughly 1.5 generations + 1 generation + 1 generation + 0.5 generation across four generations.

Intel 7 was effectively the old 10nm SuperFin process, meaning it was roughly equivalent to a 10nm-class node. Since TSMC had already achieved 10nm in 2017, Intel had fallen significantly behind and needed to catch up to the roadmap pace of 10nm in 2017, 7nm in 2019, 5nm in 2021, 3nm in 2023, and 2nm in 2025. The idea, therefore, seems to have been to start from “10nm” in 2021, push all the way to 18A within four years, and ultimately overtake TSMC.

Now, with 2025 over and 2026 underway, Gelsinger has stepped down as CEO. Intel canceled Intel 20A, but still produced Xeon 5 on Intel 7, Meteor Lake on Intel 4, Xeon 6 on Intel 3, and Panther Lake on Intel 18A. Intel 18A was officially announced on January 6, 2026, while TSMC began mass production of its 2nm process on December 30, 2025 — only a week apart.

Back during Gelsinger’s tenure, Intel repeatedly used the slogan “Moore’s Law Still Alive” at nearly every event, almost like a catchphrase. Personally, I liked it very much. Gordon Moore passed away in 2023, and when I think about whether those words ever reached his ears, it makes me feel unexpectedly emotional.

On the other hand, yesterday an interview with Intel’s current CEO, Lip-Bu Tan, appeared. He stated that Intel would continue investing in startups and had no intention of spinning off Intel’s investment division. He described corporate venture activities as being like the company’s “eyes and ears.” Perhaps Intel sees investment in startups utilizing new technologies not so much as a source of financial return, but rather as an extension of R&D without needing to hire staff or build facilities internally. Since taking office, Tan seems to have made many decisions aimed at keeping Intel from falling apart. I feel he is carrying on Gelsinger’s intentions.

Next, an NVIDIA topic. An article reports that Jensen Huang said there is “no contract” between NVIDIA and TSMC. I seriously doubt there are literally no contracts, but what he appears to mean is that they have never needed to sign formal development agreements because TSMC always delivers what NVIDIA asks for. In other words, the relationship is built on such a high level of trust that detailed contracts feel unnecessary.

Generally speaking, even without written paperwork, once a request is made and accepted, a contractual relationship still exists verbally. So naturally, some kind of information exchange must be taking place. Although nobody outside the companies can know the specifics, I imagine NVIDIA communicates its chip development plans — specifications, desired delivery dates, target pricing, launch timing, and so forth — while TSMC sets technical challenges within the range it believes achievable. Huang may simply mean that they do not prepare formal “development outsourcing plans” and similar documents.

Looking back at NVIDIA’s past products, its choice of semiconductor processes has often differed slightly from the leading edge. For example, when 10nm was the cutting-edge node, NVIDIA used TSMC’s 12nm (12FF) process instead, which was effectively about a half-pitch behind. The upcoming Feynman architecture is also expected to use A16 rather than A14. Perhaps TSMC intentionally avoids taking maximum-risk challenges at the absolute frontier and instead responds with a slightly more conservative but reliable process choice. Somehow, this made NVIDIA’s process selections make more sense to me.

Continuing with NVIDIA-related news, the AI startup Reflection, which NVIDIA has invested in, is reportedly attracting attention. NVIDIA has apparently already invested $800 million into the company. Reflection is being called the “Western DeepSeek” and is currently developing open-weight models. A year ago, the company was reportedly valued at around $545 million, but is now said to be worth between $20 billion and $25 billion. It is also expected to seek an additional $2.5 billion in fundraising.

However, the company has not yet translated this into substantial revenue-generating results. Reflection is reportedly partnering with South Korea’s Shinsegae Group to customize LLMs for the Korean language, with NVIDIA providing GPU resources.

Moving slightly away from purely technical matters, NVIDIA is also facing a shareholder class-action lawsuit. Around 2018, cryptocurrency mining boomed, and GPUs were flowing into mining operations instead of the gaming market. NVIDIA’s disclosed sales figures at the time were reportedly categorized as gaming revenue rather than cryptocurrency-related revenue.

The SEC had already raised concerns about this issue in 2022, and NVIDIA apparently restated around $600 million in revenue while paying a $5.5 million fine. However, some estimates place the true cryptocurrency-related revenue closer to $1.7 billion, meaning there may have been a discrepancy of over $1 billion between the disclosed figures and the actual situation. That discrepancy now appears to have escalated into a class-action lawsuit.

One more NVIDIA-related item: Microsoft has reportedly begun evaluating ERP workloads on Azure using NVIDIA’s Vera Rubin NVL72 rack AI system. ERP stands for Enterprise Resource Planning — essentially core business systems such as HR/payroll and accounting systems. If AI workloads are beginning to enter these areas, it likely means AI agents are starting to handle clerical and administrative work. It really feels like the times are changing.
Finally, a Google topic. Articles have appeared discussing Google’s TPU v7 Ironwood and its quantum chip Willow. Ironwood is expected to ship between 3.1 million and 3.2 million chips during 2026. The chips are integrated into TPU racks, with 64 TPUs per rack, and optical circuit switches capable of connecting up to 144 racks together. Altogether, this forms a 9,216-TPU system. Meta is reportedly planning to rent TPU systems beginning in 2027.

Meanwhile, the Willow quantum chip is said to perform certain tasks 13,000 times faster than conventional supercomputers, making it promising for applications such as drug discovery and materials science. Willow uses superconducting qubits for high-speed quantum computation, but Google also intends to expand into neutral-atom quantum computing, which appears to offer a more scalable architecture.

March 28, 2026

Come to think of it, there are reports that Intel and AMD CPUs will see price increases starting in April. The increases are said to be around 10% for Intel and 15% for AMD. There has been constant talk of price hikes since the beginning of this year, but the lot prices listed on the web (unit prices when purchasing 1,000 units) do not appear to have changed so far. Publicly listed prices are only reference values, and I assume that actual transaction prices are determined through individual negotiations at the time of purchase. Of course, there is no way to know the specific contract prices, but the general atmosphere within the industry probably gives people a sense of whether prices are currently considered high or low.
Next, a topic related to Qualcomm. There are reports that Qualcomm will use TSMC’s 2nm process for manufacturing the next-generation Snapdragon 8 Elite Gen6. Since the current Gen5 is manufactured using TSMC’s 3nm process, there had been speculation that Gen6 might instead use Samsung’s 2nm process, but apparently that was not the case. It also seems there will be two variants of the chip. I recall an article from late February stating that Qualcomm’s India team had completed tape-out on a 2nm design, and it now appears that the destination was TSMC.
Finally, a topic about AWS. There was an interview article featuring AWS CEO Matt Garman. It seems he spoke with CNN to commemorate the 20th anniversary of AWS, which launched its services in 2006. Personally, my image of Amazon has mostly been that of an online retailer I occasionally use, without much direct connection beyond that. Jeff Bezos founded the company, and what initially seemed like a business focused on cataloging and tagging products for e-commerce, while also renting out servers and storage, eventually evolved into selling computing power on an instance-by-instance basis. Before long, it had transformed into a cloud service provider operating massive data centers. Of course, during that same period, hardware and middleware also advanced significantly: wired and wireless communication speeds improved, smartphones emerged, and virtualized computing environments became widespread.

In the interview, Garman mentioned that AWS’s AI-related investment for fiscal year 2026 would amount to $200 billion, and explained that the spending would go primarily toward data centers and servers. At an exchange rate of 160 yen to the dollar (with the yen hovering near that level lately, perhaps influenced by the situation in the Middle East), that amounts to roughly 32 trillion yen. A single corporation is investing an amount equal to roughly one quarter of Japan’s annual national budget, despite Japan being the world’s fourth-largest economy by GDP. Personally, I have long considered semiconductors and computers to be strategic resources, but now we are reaching a point where the scale of investment in semiconductors and computing rivals national budgets themselves.

Since the Industrial Revolution, technologies related to iron production advanced dramatically, particularly in Europe and the United States, and buildings once made of brick and stone gradually became reinforced with steel. Otto von Bismarck’s famous phrase, “Iron is the nation,” is well known. In today’s world, where data sovereignty and AI sovereignty are increasingly emphasized, it may no longer be an exaggeration to say that semiconductors are the nation.

March 29, 2026

Come to think of it, information has started to emerge regarding the actual hardware of AMD’s next-generation Zen6 EPYC Venice processors. It appears that evaluation of engineering samples has begun. Details about the evaluation systems have also surfaced, with the systems reportedly named Congo, Kenya, and Nigeria. Congo and Kenya are said to be single-socket systems, while Nigeria is a dual-socket system.

Some articles describe the samples simply as Zen6, but judging from the details, they appear to be Zen6c. Zen6c is regarded as the high-density core version of Zen6, succeeding Zen4c and Zen5c. The CCD designs differ between the standard and “c” variants, and the historical core counts per CCD have been as follows: Zen4/4c had 8 cores / 16 cores, Zen5/5c had 8 cores / 16 cores, and Zen6/6c is expected to have 12 cores / 32 cores. Zen6 will reportedly be manufactured using TSMC’s 2nm process. (Zen4 used 5nm, Zen5 used 4nm, and Zen5c used 3nm.)

The Venice engineering samples this time consist of three CPU variants: 192 cores (384 threads), 128 cores (256 threads), and 64 cores (128 threads). The number of CCDs appears to be eight, four, and two respectively. The 128-core and 64-core models divide evenly into 32-core CCDs, but the 192-core model works out to 24 cores per CCD. Since eight 32-core CCDs should theoretically result in 256 cores (512 threads), it is possible that the system side — perhaps the evaluation environment — limited the configuration to 192 cores, or that there were constraints related to cooling, power consumption, or something similar. Venice is also expected to feature two IO dies, increasing DDR memory channels from 12 to 16. The socket will change as well, from SP5 to SP7. Full-configuration testing is likely still ahead.
Turning to a somewhat different topic — although lately many topics seem to circle back to the Middle East — this one concerns water shortages. Apparently, western Taiwan is experiencing a particularly severe shortage of water. Many of TSMC’s fabs are concentrated on the western side of the island. Semiconductor manufacturing requires enormous quantities of water, and I have heard that one reason TSMC chose Kumamoto for its Japan fab was the abundant water supply from the Aso region.

Of course, TSMC’s fabs make extensive use of resource recycling technologies, and there is no doubt that they are highly environmentally conscious semiconductor facilities. Nevertheless, there appears to be growing concern that if the water shortage becomes severe enough, it could begin affecting production. That said, even before considering semiconductor manufacturing, water scarcity itself is clearly a serious issue for the entire region.

March 30, 2026

Come to think of it, NVIDIA appears to have announced the RTX60 series, including the GeForce RTX6090. At the recent GTC event, I recall that the cartridge itself was shown, but no specific product models were officially introduced. This time, however, the GeForce RTX6070, 6080, and 6090 have been announced. Since these are Rubin-based products, their GPU chip model numbers are GR205, GR203, and GR202 respectively. All of them are manufactured using TSMC’s 3nm process.

The flagship RTX6090 retains the same 32GB of GDDR7 memory and 512-bit memory bus as the previous-generation RTX5090. However, the 6070 and 6080 models increase memory capacity by 4GB compared to the previous 5070 and 5080, while also widening the memory bus by 64 bits. They are expected to launch in 2027.

Next, a topic related to AI. There are reports that the South Korean AI startup Rebellions has raised $400 million. At ISSCC 2026 in February, the company introduced an inference chip called Rebel100, which now appears to already be in commercial deployment.

The design connects four inference chips using UCIe. Two chips are arranged symmetrically side by side, and then two of those sets are stacked vertically. HBM3e memory is positioned on both sides, but because the inference chips themselves are slightly longer than the HBM3e stacks, there are additional silicon chips placed beside the HBM. However, these are apparently not ordinary dummy silicon pieces. They reportedly contain integrated decoupling capacitors — essentially built-in bypass capacitors — and are referred to as ISC (Integrated Silicon Capacitors).

The entire assembly is mounted on a silicon interposer, and Samsung is said to be the semiconductor manufacturing partner. I believe the name Rebellions also appeared during last week’s announcement of Arm’s AGI CPU, where it was mentioned as one of the initial users. Since Samsung and SK hynix are also investors in the company, it is believed that Rebellions may be less vulnerable to ongoing memory shortages.

Another AI-related topic: the European AI model developer Mistral has reportedly raised $800 million. The company plans to construct a data center outside Paris using 13,800 NVIDIA GB300 systems. This is estimated to amount to roughly 1,900 racks. Mistral, a French company, is widely viewed as an important player in Europe’s push for AI sovereignty. According to reports, the company also plans to build an AI data center in Sweden with an investment of €1.2 billion, while aiming to develop 200MW of data center capacity across Europe by 2027.
Finally, a topic about Apple. Apple CEO Tim Cook reportedly commented on smartphone usage, encouraging people to reduce their screen time. The remarks appear to have been made during a television program appearance. He reportedly said that people should not spend excessive amounts of time scrolling at the expense of direct human communication.

I do not think this issue applies only to younger generations, but recently there has certainly been growing concern over the effects of social media use on children. Some local governments have moved to restrict social media or gaming access based on age, while more schools are banning smartphone use during class hours in order to encourage face-to-face communication among students.

I think such examples can be positive. On the other hand, restricting smartphones too heavily may also have significant drawbacks, considering how deeply they are embedded as social infrastructure. Debates over the mutual influence between society and technology are nothing new, of course, but they are likely to continue indefinitely. And perhaps, after smartphones, the next major subject of that debate will be AI.

March 31, 2026

Come to think of it, there are reports that NVIDIA has invested $2 billion in Marvell. The two companies appear to have partnered in the field of silicon photonics. NVIDIA will provide NVLINK Fusion, which Marvell will also adopt in order to develop custom XPUs that can connect to NVIDIA CPUs, GPUs, DPUs, NICs, and Ethernet switches. The two companies are also believed to have shared a vision for expanding AI-RAN, 5G, and future 6G networks.
Next, an Intel-related topic. Die photographs of Intel’s Panther Lake-H have reportedly surfaced. These are different from the images released on March 8 and show the high-end variant. The compute tile, manufactured on the Intel 18A process, contains 4 P-cores, 8 E-cores, and 4 LP-E cores. Since it uses backside power delivery (BSPD), there are no power lines on the signal layer, so one point of interest is whether the circuits are packed more densely. However, even with the high-resolution photographs, it is difficult to tell when compared with other chiplets. The visible gaps between circuit blocks under an optical microscope are often required when different voltage domains are used or when analog circuits need isolation, so these gaps are not necessarily related to the process generation itself. Perhaps a meaningful comparison would only be possible if the same functional blocks could be compared directly in terms of area.

The photographs also show the GPU tile and IO tile. In Panther Lake-H, the GPU tile includes 12 Xe3 cores and is manufactured on TSMC’s N3E process, while the IO tile is produced on TSMC’s N6 process. Combined, the GPU and IO tiles appear slightly larger in area than the compute tile itself.

Continuing with CPU-related topics, there were articles discussing Intel’s views on the Arm AGI CPU and AMD’s views on Intel CPUs. The Arm AGI CPU appears to be positioned for AI datacenters, and it feels as though discussions about what kind of CPU architecture is best suited for AI workloads are only beginning. As industry attention shifts from AI training toward AI inference, debate over CPU architectures seems to be becoming active again after a long quiet period.

CPU design has evolved alongside the development of computing itself, with different considerations emerging in each era. By the 1990s, most of the technologies used in modern CPU cores had already appeared: pipelines, superscalar execution, multithreading, out-of-order execution, speculative execution, VLIW, and cache memory. Incidentally, floating-point units were still treated as coprocessors back then. Toward the end of the 1990s, the relationship between logic design and power consumption became better understood, and in the 2000s many power-saving technologies emerged, including clock gating, power gating, DVFS, and back-gate biasing.

As transistor scaling progressed in the 2000s and supply voltages fell, on-chip multiprocessors became common and servers increasingly grew through parallelization. Performance metrics at the time centered on integer and floating-point benchmarks such as SPEC, with power consumption later added as another important metric. Supercomputer performance was measured using LINPACK, and systems such as Japan’s K computer became closer to buildings than traditional computers. (Even the Earth Simulator in 2000 was as large as a gymnasium.) Today, hyperscalers are effectively filling entire datacenter buildings with CPUs. There are even SPEC benchmarks focused on how many virtual machines a CPU can run.

In future AI datacenters, CPUs together with GPUs and NPUs will fill these facilities, and discussions about the ideal CPU architecture for AI workloads are likely to intensify. It is still unclear what configuration is truly optimal for AI workloads, especially since AI workloads themselves may change rapidly and unexpectedly. At the same time, because AI-specific tasks are offloaded to NPUs, CPUs will remain responsible for general-purpose preparation and orchestration tasks. The scope of those general-purpose tasks will likely evolve over time.

In a sense, image processing and AI training have already been offloaded to GPUs. Modern CPUs also contain vector units capable of HPC-class performance, but opinions may diverge over whether such functionality will continue to be necessary going forward.

From the perspective of preprocessing AI workloads, CPU vendors currently appear to have fairly different design philosophies. It may be useful to organize some of the design elements at the present stage.

Starting with the basics, semiconductor processes range from Intel 18A and Intel 3 to TSMC 2nm and 3nm. Core operating voltages follow the process technology, while clock frequencies seem to be settling around 3–4 GHz. Voltage and frequency are closely tied to heat generation and cooling, and today frequency may be determined largely by how much cooling cost a system can tolerate. In the past, maximum frequency was constrained mainly by logic depth and critical paths, though it is unclear whether that is still the dominant limitation now that transistors have become so small and fast.

Heat generation depends on workload, core count, and frequency, and how efficiently a core operates ultimately depends on design choices including power-saving technologies.

In terms of physical core count, NVIDIA’s Vera has relatively few at 88 cores, while Intel’s Clearwater Forest reaches as many as 288 cores. Intel Granite Rapids has 128 cores, Arm AGI has 136 cores, and AMD Venice (Zen6c) is estimated at around 256 cores. Higher core counts are generally favorable for VM performance in datacenters, though how they will affect AI datacenters remains uncertain. More cores could mean more AI agents running simultaneously, but throughput will still depend on many other system bottlenecks.
Threading configurations differ as well. Some processors use single-threaded cores while others support two threads per core. In datacenters, there are concerns about noisy-neighbor effects and security, leading some to prefer single-threaded designs. Intel Clearwater Forest (E-core) and Arm AGI belong to this category. AMD Zen6/6c, NVIDIA Vera, and Intel Granite Rapids (P-core) support two threads per core. NVIDIA claims that Vera’s “spatially separated threading” avoids noisy-neighbor issues. AMD and Intel also reportedly allow SMT to be disabled in BIOS, suggesting that all vendors are aware of the importance of single-threaded operation. Rumors even suggest Intel’s future Diamond Rapids P-core may become single-threaded. Which approach is better for AI workloads remains unclear.
L3 cache capacity per logical core appears to fall into roughly two categories: around 1 MB and around 2 MB. NVIDIA Vera provides about 0.92 MB per logical core, Arm AGI about 0.94 MB, AMD Venice approximately 2 MB, and Intel Granite Rapids roughly 1.97 MB. AWS Graviton5 also sits near 1 MB per logical core. More cache is generally beneficial, though balance with L2 cache and memory bandwidth also matters.
Memory channel counts are increasing as well. Granite Rapids still uses DDR5 with 8 channels, while Clearwater Forest and Arm AGI move to 12 channels. NVIDIA Vera uses LPDDR5 with 8 channels, but its SOCAMM modules allow up to 192 GB per module, yielding about 1.5 TB per socket overall. AMD Venice expands to 16 DDR5 channels through dual IO dies. AI workloads require large memory capacity, and newer CPUs appear increasingly optimized for larger memory configurations.
Vector units remain another important point. Granite Rapids supports AVX-512, AMD Zen6/6c includes dual 256-bit vector units per core, Clearwater Forest has dual 128-bit AVX2 units, Arm AGI includes dual 128-bit SVE2 units, and NVIDIA Vera contains six 128-bit SVE2 units. Large 512-bit vector units are primarily intended for FP64 scientific computing but consume significant power, making them less attractive unless a CPU is explicitly targeting supercomputing workloads. However, vector engines are still useful for handling FP8 and FP4 AI workloads.
Matrix engines are increasingly common as well. Intel Granite Rapids includes AMX, introduced with Sapphire Rapids in 2021. Clearwater Forest, however, omits it. The Arm architecture offers SME and SME2 matrix engines, with SME supporting FP32 matrix operations and SME2 extending support to FP16. Qualcomm Snapdragon chips already include SME, and Samsung’s next-generation Exynos was expected to adopt SME2.

One notable point is that Arm AGI CPU does not appear to include SME2. Perhaps it assumes matrix operations will be entirely offloaded to NPUs. NVIDIA Vera likewise omits SME and SME2. After all, GPU Tensor Cores already serve as matrix engines, beginning with NVIDIA’s Volta GPUs in 2017, and later influencing AMX and SME designs in CPUs. AMD similarly relies on CDNA GPUs for matrix workloads and does not include dedicated matrix engines in Zen6. Ryzen AI incorporates XDNA NPUs, which may appear in Zen7 or later server processors.

At present, integrated CPU matrix engines can probably be viewed as temporary or auxiliary NPUs. Intel’s Panther Lake already includes an NPU capable of around 50 TOPS. At AI datacenter scale, the current trend is clearly toward offloading matrix computation to GPUs or dedicated NPUs rather than CPUs.

Traditionally, CPU characteristics were discussed mainly in terms of core count, cache capacity, clock frequency, and heat output. Today, however, memory bandwidth and specialized accelerators also deserve close attention. Current AI-oriented metrics focus on AI FLOPS (FP8 or FP4), NPU TOPS, and tokens per second, while benchmarks such as MLPerf are likely to become increasingly standardized. Once society becomes more aware of the relationship between benchmark scores and actual software performance or user experience, architectures designed specifically to maximize benchmark performance will likely emerge and influence future hardware designs. For now, it may be best simply to watch how things evolve.

Another CPU-related topic concerns Intel’s BOT (Binary Optimization Tool), which has reportedly been receiving widespread attention over the past few days. It appears capable of improving gaming performance by roughly 8% overall, with gains reaching around 18% in some cases. Articles suggest the tool vectorizes scalar instructions, thereby improving IPC through heavier use of vector execution units. Specialized accelerators become “dark silicon” if no software actually uses them, leading people to question their necessity. On the other hand, once optimizations demonstrate measurable gains, those hardware features suddenly become essential again. This is how hardware evolves over time.
Finally, a TSMC-related topic. Taiwan’s Ministry of Economic Affairs Investment Commission has reportedly approved TSMC’s plan to manufacture 3nm chips at its Kumamoto fab (FAB23) in Japan. The goal is to achieve monthly production of 15,000 12-inch wafers by 2028. By that time, TSMC Taiwan is expected to have moved to the A14 process generation, meaning the so-called “N-2 rule” — under which overseas fabs are limited to processes two generations behind Taiwan’s most advanced nodes — will still be maintained.
Another TSMC-related article discussed ESMC, the fab under construction in Dresden, Germany. ESMC is jointly funded by TSMC, Bosch, Infineon, and NXP, with TSMC holding a 70% stake and the other three companies each holding 10%. ESMC recently held a symposium for the local region, during which TSMC reportedly explained its 1.6nm process technology. Existing public plans had referred mainly to 28/22nm and 16/12nm processes, so it feels somewhat odd to discuss a process that is not currently scheduled for introduction. Perhaps, similar to JASM in Kumamoto, the plan is to begin with more mature processes before eventually introducing leading-edge technologies later on.

This Blog text was translated by AI from Japanese Source Blog.

Tuesday, March 31, 2026

MAR.(2nd Half) 2026

MAR.(2nd Half) 2026