Page1Total:17PagesMigratingfromCortex-M3toCortex-M4RoyLuoGlobalTechnologyCentreelement14(FormerlyPremierFarnell)March20111IntroductionTheARMCortex-M4processoristhelatestembeddedprocessorbyARMspecificallydevelopedtoaddressdigitalsignalcontrolmarketsthatdemandanefficient,easy-to-useblendofcontrolandsignalprocessingcapabilitiesinmicrocontrollerapplications.Thecombinationofhigh-efficiencysignalprocessingfunctionalitywiththelow-power,lowcostandease-of-usebenefitsoftheCortex-Mfamilyofprocessorsisdesignedtosatisfytheemergingcategoryofflexiblesolutionsspecificallytargetingthemotorcontrol,automotive,powermanagement,embeddedaudioandindustrialautomationmarkets.TheCortex-M4processorextendstheuseofCortex-McorestoapplicationsthatrequiremorecomputationalperformancethanavailablecurrentlywithCortex-M3.TheCortex-M4featuresasingle-cyclemultiply-accumulate(MAC)unit,optimizedsingleinstructionmultipledata(SIMD)instructions,saturatingarithmeticinstructionsandanoptionalsingleprecisionFloating-PointUnit(FPU).So,theCortex-M4isaCortex-M3withtheDSPinstructionadd-ons,andmigratingfromCortex-M3toCortex-M4isveryeasy!1.1WhychangetoCortex-M4?•HigherPerformanceJustliketheCortex-M3,theCortex-M4providesanintegerperformancelevelof1.25Dhrystone2.1MIPSperMHz,butCortex-M4provideshigherperformanceondigitalsignalprocessing.Pleasereferto2.Cortex-M4FeaturesformoreinformationonCortex-M4.•DigitalSignalProcessingCapabilitiesTheCortex-M4integratesasingle-cyclemultiply-accumulate(MAC)unitsupportingavarietyof16-and32-bitmultiplieswith32-and64-bitaccumulationsandaninstructionsetofsingle-cycleSIMD(SingleInstructionMultipleData)featuringdual16-bitandquad8-bitoperations.TheCortex-M4FPUisanimplementationofthesingleprecisionvariantoftheARMv7-MFloating-PointExtension(FPv4-SP).Itprovidesfloating-pointcomputationfunctionalitythatiscompliantwiththeANSI/IEEESTD754-2008,IEEEStandardforBinaryFloating-PointArithmetic,referredtoastheIEEE754standard.TheFPUPage2Total:17Pagessupportsallsingle-precisiondata-processinginstructionsanddatatypesdescribedintheARMArchitectureReferenceManual.•SatisfyingtheRequirementsofNext-GenerationProductsTheARMCortex-Mfamilyisaimedattheareassuchascommercialelectronicsandlow-costindustrialcontrolincludingmotorcontrol,powermanagement,automotiveelectronics,andaudioprocessing.TheincreasingcomputationalloadsintheseareasconsumeanunacceptableportionoftheCPUresourcesifallthedigitalsignalprocessingtasksarehandledbysoftware.TheARMCortex-M4solvesthisissuebyintegratingasingle-cyclemultiply-accumulate(MAC)unitandaninstructionsetofsingle-cycleSIMDoperations,aswellasanoptionalFPUtosatisfythedigitalsignalprocessingrequirementsofnext-generationproducts.ACortex-M4canberegardedasaCortex-M3withintegratedDSPextensions,whichmeansthesoftwarefromtheCortex-M3canalsofunctionintheM4anditiseasytoimplementmigrationfromM3toM4withouttoomucheffort.Thefiguresshownbelowillustratetherelationbetweenthesetwoprocessors.1.2ReferencesMaterialsCortex-M3TechnicalReferenceManual,ARMDDI0337G,ARMLtd.Cortex-M4TechnicalReferenceManual,ARMDDI0337G,ARMLtd.ARMv7-MArchitectureReferenceManual,ARMDDI0403D,ARMLtd.CortexMicrocontrollerSoftwareInterfaceStandard(see).ApplicationNote179–Cortex-M3EmbeddedSoftwareDevelopment,ARMDAI0179B,ARMLtd.DSP&OptionalFPUCortex-M3+Cortex-M4=Page3Total:17Pages2Cortex-M4Features2.132-bitMultiply-Accumulate(MAC)UnitThe32-bithardwaremultiply-accumulate(MAC)unitaddedintheCortex-M4iscapableofaccomplishinganoperationofupto32×32+64-64ortwooperationsof16×16inasignalcycle.Thishigh-performanceunitmakesdigitalsignalprocessingmoreefficientandgreatlyreducestheconsumptionofCPUresources.The32-bitmultiply-accumulate(MAC)unithasthreemainfeatures:•Widerangeofmultiply-accumulateinstructions•Choiceof16or32bitmultiplyand32or64bitaccumulate•Allinstructionsexecuteinasinglecycle2.2SingleInstructionMultipleData(SIMD)InstructionsTheCortex-M4isintegratedwithasetofsingle-cycleSIMDinstructions.TheSIMDinstructionsetincludesaseriesofDSPinstructionssuchasadd,subtract,multiply,multiplyandaccumulate,whichisusedtorealizetheimplementationofthecommonDSPoperationsincludingFIR,IIR,complexFFT,PID,matrixaddition,matrixsubtraction,andmatrixmultiplication.Withtheseinstructions,aCortex-M4canofferahighercomputationalefficiencywhenrunningDSPprogramsthanaCortex-M3.TheSIMDhasthreemainfeatures:•Quad(4parallel)8-bitaddsorsubtracts•Dual(2parallel)16-bitaddsorsubtracts•Allinstructionsexecuteinasinglecycle2.3FloatingPointUnit(FPU)TheFPUisanoptionalunitoftheCortex-M4.Manufacturerscanmaketheirowndecisionsontheavailabilityofthisunitaccordingtotheirdifferentrequirements.TheFPUfullysupportssingle-precisionadd,subtract,multiply,divide,multiplyandaccumulate,andsquarerootoperations.Italsoprovidesconversionsbetweenfixed-pointandfloating-pointdataformats,andfloating-pointconstantinstructions.TheFPUhasfourmainfeatures:•FPextensionregistersthatsoftwarecanviewaseither32single-precisionor16doublewordregisters•Single-precisionfloating-pointarithmetic•Conversionsamonginteger,single-precisionfloating-point,andhalf-precisio